Java Program to Read a File From HDFS

This post shows a Java program to read a file from HDFS using the Hadoop FileSystem API.

Steps for reading the file in HDFS using Java are as follows –

  • FileSystem is an abstraction of file system of which HDFS is one implementation. So you will have to get an instance of FileSystem (HDFS in this case) using the get method.
  • In the program you can see get() method takes Configuration as an argument. Configuration object has all the configuration related information read from the configuration files (i.e. core-site.xml from where it gets the file system).
  • In HDFS Path object represents the Full file path.
  • Once you get the file, to read it input stream is used which in HDFS is FSDataInputStream.
  • For output stream, System.out is used which prints the data on console.

Java Program to read a file from HDFS

Executing program in Hadoop environment

To execute above program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.

I have my HDFSFileRead.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.

Then you can run the program by providing the HDFS file that has to be read as an argument to your Java program-

Using IOUtils class to read a file in HDFS

Hadoop framework provides an utility class IOUtils that has many convenient methods related to I/O. You can use that to read a file in HDFS and display it’s content on console. Using IOUtils will reduce the program size.

Java program to read HDFS file

That’s all for the topic Java Program to Read a File From HDFS. If something is missing or you have something to share about the topic please write a comment.

You may also like


  1. Pingback: Exception Handling With Method Overriding in Java - KnpCode

  2. Pingback: Introduction to YARN - KnpCode

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.