Introduction to YARN in Hadoop

In order to address the scalability issues in MapReduce1 a new cluster management system was designed which is known as YARN (Yet Another Resource Negotiator). Yarn was introduced in Hadoop 2.x versions and it is also known as MapReduce2. This post gives an introduction to YARN in Hadoop, also talks…

Continue reading

What is Big Data

Big Data means a very large volume of data. Term big data is used to describe data so huge and ever growing that has gone beyond the storage and processing capabilities of traditional data management and processing tools. Some Examples Facebook which stores data about your posts, notification clicks, post…

Continue reading

HDFS High Availability

In this post we’ll see what is HDFS high availability, high availability architecture and the configuration needed for HDFS high availability in Hadoop cluster. Some background on HDFS high availability Prior to Hadoop 2, the NameNode was a single point of failure (SPOF) in an HDFS cluster. In a HDFS cluster there’s…

Continue reading

What is HDFS Federation in Hadoop

This post shows what is HDFS federation in Hadoop framework and what configuration changes are required for setting up HDFS federation. Problem with HDFS architecture In a Hadoop cluster namespace management and block management both are done by Namenode. So, essentially the Namenode has to perform the following tasks- 1-…

Continue reading