Hadoop 1.x Architecture and Drawbacks

Hadoop is built on two whitepapers published by Google, i.e,

  • HDFS
  • Map Reduce

HDFS: Hadoop Distributed File System

It is different from the normal file system in a way that the data copied on to HDFS is split into ‘n’ blocks and each block is copied on to a different node in the cluster. To achieve this we use master-slave architecture

  • HDFS Master => Name Node: Takes the client request and responsible for orchestrating the data copy across the cluster
  • HDFS Slave => Data Node: Actually saves the block of data and coordinates with its master

MapReduce: This is the processing engine and is also implemented in master-slave architecture.

  • MR Master => Job Tracker: Takes the incoming jobs, identifies the available resources across the cluster, divides the job into tasks and submits it to the cluster
  • MR Slave => Task Tracker: Actually runs the task and coordinates with its master.

Architecture

hadoop1_arch

Drawbacks

  • Design of JobTracker is done in such a way that its tightly coupled with two important responsibilities “Resource Management” and “MapReduce Task Execution”. Because of this reason the cluster cannot be used for other distributed computing technologies like Spark/Kafka/Storm/… other than Hadoop MapReduce
  • Name Node can maintain metadata of upto 4000-5000 data nodes at maximum. This will limit the cluster scalability to 4k-5k nodes
  • Hard partition of slot into Mapper and Reducer slots
  • JobTracker was a Single Point Of Failure SPOF
  • Iterative applications (Machine Learning) are very slow (10x times slower than YARN)
  • Lack of wire compatible protocols between client and sever in MapReduce applications (like hive and pig where they can support multiple versions on the same cluster)

Addressing these drawbacks hadoop 2.x is released.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Mawazo

Mostly technology with occasional sprinkling of other random thoughts

amintabar

Amir Amintabar's personal page

101 Books

Reading my way through Time Magazine's 100 Greatest Novels since 1923 (plus Ulysses)

Seek, Plunnge and more...

My words, my world...

ARRM Foundation

Do not wait for leaders; do it alone, person to person - Mother Teresa

Executive Management

An unexamined life is not worth living – Socrates

javaproffesionals

A topnotch WordPress.com site

thehandwritinganalyst

Just another WordPress.com site

coding algorithms

"An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." -- John Tukey

%d bloggers like this: