Namenode in Safemode
This post shows what is Safemode in Namenode and what are the configurations for the safemode. You will also see the commands available to enter and leave the safemode explicitly. When the Namenode is started it loads the file system state into memory initially from the fsimage and then apply…
What is HDFS Federation

This post shows what is HDFS federation in Hadoop framework and what configuration changes are required for setting up HDFS federation. Problem with HDFS architecture In a Hadoop cluster namespace management and block management both are done by Namenode. So, essentially the Namenode has to perform the following tasks- 1-…
Frequently Used HDFS Commands With Examples
In this post there is a compilation of some of the frequently used HDFS commands with examples which can be used as reference. All HDFS commands are invoked by the bin/hdfs script. Running the hdfs script without any arguments prints the description for all commands. 1- HDFS command to create a…
NameNode, Secondary Namenode and Datanode in HDFS

In this post working of the HDFS components Namenode, Datanode and Secondary Namenode are explained in detail. Namenode in Hadoop HDFS works on a master/slave architecture. In HDFS cluster Namenode is the master and the centerpiece of the HDFS file system. Namenode manages the file system namespace. It keeps the…
HDFS Replica Placement Policy

As per the replica placement policy in Hadoop each HDFS block is replicated across different nodes. Default replication factor is 3 which means by default each HDFS block is replicated on three different nodes in order to make HDFS reliable and fault tolerant. Considerations for HDFS replica placement policy When…
Introduction to Hadoop Distributed File System (HDFS)
What is Big Data
Big Data means a very large volume of data. Term big data is used to describe data so huge and ever growing that has gone beyond the storage and processing capabilities of traditional data management and processing tools. Some Examples of Big Data Facebook which stores data about your posts,…
Installing Hadoop in Pseudo-Distributed Mode

In this post we’ll see how to install Hadoop in Pseudo-distributed mode (single node cluster). With the steps given here for Hadoop installation you will have Hadoop common, HDFS, MapReduce and YARN installed. Hadoop release used for installation is Hadoop 2.9.0 and it is installed on Ubuntu 16.04. Modes in…