How to Read And Write Avro Files in Hadoop

In this post we’ll see how to read and write Avro files in Hadoop using the Java API.

Required Jars

To write Java programs to read and write Avro files you will need to put following jars in classpath. You can add them as Maven dependency or copy the jars.

  • avro-1.8.2.jar
  • avro-tools-1.8.2.jar
  • jackson-mapper-asl-1.9.13.jar
  • jackson-core-asl-1.9.13.jar
  • slf4j-api-1.7.25.jar

Java program to write avro file

Since Avro is used so you’ll need avro schema.

schema.avsc

Java code

Note that in this code output avro file is created in local file system. If you want to create output file in HDFS then you need to pass the path using the following changes.

And pass this OutputStream object in the create method

Executing program in Hadoop environment

Before running this program in Hadoop environment you will need to put the above mentioned jars in $HADOOP_INSTALLATION_DIR/share/hadoop/mapreduce/lib.

Also put the current version Avro-1.x.x jar in the location $HADOOP_INSTALLATION_DIR/share/hadoop/common/lib if there is a version mismatch.
To execute above Java program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.

I have my ExampleAvroWriter.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.

Then you can run the program using the following command-

Java program to read avro file

In order to read the avro file stroed in HDFS in the previous example, you can use the following method.

Output

If you want to read avro file from local file system them you can use the following method.

That’s all for the topic How to Read And Write Avro Files in Hadoop. If something is missing or you have something to share about the topic please write a comment.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.