Avro File Format in Hadoop

Apache Avro is a data serialization system native to Hadoop which is also language independent. Apache Avro project was created by Doug Cutting, creator of Hadoop to increase data interoperability in Hadoop. Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby are available making it easier to interchange…

Continue reading

Avro MapReduce Example

This post shows an Avro MapReduce example program using the Avro MapReduce API. As an example word count MapReduce program is used where the output will be an Avro data file. Required jars avro-mapred-1.8.2.jar Avro word count MapReduce example Since output is Avro file so an Avro schema has to…

Continue reading