Hadoop 1.x Architecture and Drawbacks

February 13, 2017 by Niranjan Tallapalli Leave a comment

Hadoop is built on two whitepapers published by Google, i.e,

HDFS
Map Reduce

HDFS: Hadoop Distributed File System

It is different from the normal file system in a way that the data copied on to HDFS is split into ‘n’ blocks and each block is copied on to a different node in the cluster. To achieve this we use master-slave architecture

HDFS Master => Name Node: Takes the client request and responsible for orchestrating the data copy across the cluster
HDFS Slave => Data Node: Actually saves the block of data and coordinates with its master

MapReduce: This is the processing engine and is also implemented in master-slave architecture.

MR Master => Job Tracker: Takes the incoming jobs, identifies the available resources across the cluster, divides the job into tasks and submits it to the cluster
MR Slave => Task Tracker: Actually runs the task and coordinates with its master.

Architecture

Drawbacks

Design of JobTracker is done in such a way that its tightly coupled with two important responsibilities “Resource Management” and “MapReduce Task Execution”. Because of this reason the cluster cannot be used for other distributed computing technologies like Spark/Kafka/Storm/… other than Hadoop MapReduce
Name Node can maintain metadata of upto 4000-5000 data nodes at maximum. This will limit the cluster scalability to 4k-5k nodes
Hard partition of slot into Mapper and Reducer slots
JobTracker was a Single Point Of Failure SPOF
Iterative applications (Machine Learning) are very slow (10x times slower than YARN)
Lack of wire compatible protocols between client and sever in MapReduce applications (like hive and pig where they can support multiple versions on the same cluster)

Addressing these drawbacks hadoop 2.x is released.

Filed under big data, Hadoop

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

coding algorithms

Hadoop 1.x Architecture and Drawbacks

Leave a comment Cancel reply

Coding Algorithms is referred

Categories

Subscribe via email

Mostly Viewed

Recent Comments

Recent Posts

Archives

Blogs I Follow

coding algorithms

Hadoop 1.x Architecture and Drawbacks

Rate this:

Share this:

Related

Leave a comment Cancel reply

Coding Algorithms is referred

Categories

Subscribe via email

Trending Categories

Mostly Viewed

Recent Comments

Recent Posts

Archives

Blogs I Follow