Avro File Format in Hadoop

Apache Avro is a data serialization system native to Hadoop which is also language independent. Apache Avro project was created by Doug Cutting, creator of Hadoop to increase data interoperability in Hadoop. Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby are available making it easier to interchange…

Continue reading

HDFS Replica Placement Policy

As per the replica placement policy in Hadoop each HDFS block is replicated across different nodes. Default replication factor is 3 which means by default each HDFS block is replicated on three different nodes in order to make HDFS reliable and fault tolerant. Considerations for HDFS replica placement policy When…

Continue reading

What is Hadoop

Apache Hadoop is an open source framework for storing data and processing of data set of big data on a cluster of nodes (commodity hardware) in parallel. Hadoop framework is designed to scale up from single server to thousand of machines with each machine offering both storage and computation. It…

Continue reading