This post shows a Java program to read a file from HDFS using the Hadoop FileSystem API.
Steps for reading the file in HDFS using Java are as follows-
- FileSystem is an abstraction of file system of which HDFS is one implementation. So you will have to get an instance of FileSystem (HDFS in this case) using the get method.
- In the program you can see get() method takes Configuration as an argument. Configuration object has all the configuration related information read from the configuration files (i.e. core-site.xml from where it gets the file system).
- In HDFS Path object represents the Full file path.
- Once you get the file, to read it input stream is used which in HDFS is
FSDataInputStream
. - For output stream, System.out is used which prints the data on console.
Java Program to read a file from HDFS
import java.io.IOException; import java.io.OutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class HDFSFileRead { public static void main(String[] args) { Configuration conf = new Configuration(); try { FileSystem fs = FileSystem.get(conf); // Hadoop DFS Path - Input file Path inFile = new Path(args[0]); // Check if input is valid if (!fs.exists(inFile)) { System.out.println("Input file not found"); throw new IOException("Input file not found"); } // open and read from file FSDataInputStream in = fs.open(inFile); // system.out as output stream to display //file content on terminal OutputStream out = System.out; byte buffer[] = new byte[256]; try { int bytesRead = 0; while ((bytesRead = in.read(buffer)) > 0) { out.write(buffer, 0, bytesRead); } } catch (IOException e) { System.out.println("Error while copying file"); } finally { // Closing streams in.close(); out.close(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
Executing program in Hadoop environment
To execute above program in Hadoop environment, you will need to add the directory containing the .class file for the Java program in Hadoop’s classpath.
export HADOOP_CLASSPATH='/huser/eclipse-workspace/knpcode/bin'
I have my HDFSFileRead.class file in location /huser/eclipse-workspace/knpcode/bin so I have exported that path.
Then you can run the program by providing the HDFS file that has to be read as an argument to your Java program-
hadoop org.knpcode.HDFSFileRead /user/input/test/aa.txt
Using IOUtils class to read a file in HDFS
Hadoop framework provides an utility class IOUtils
that has many convenient methods related to I/O. You
can use that to read a file in HDFS and display it’s content on console. Using IOUtils will reduce the program size.
Java program to read HDFS file
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class HDFSFileRead { public static void main(String[] args) { Configuration conf = new Configuration(); try { FileSystem fs = FileSystem.get(conf); FSDataInputStream in = null; // Hadoop DFS Path - Input file Path inFile = new Path(args[0]); // Check if input is valid if (!fs.exists(inFile)) { System.out.println("Input file not found"); throw new IOException("Input file not found"); } try { // open and read from file in = fs.open(inFile); IOUtils.copyBytes(in, System.out, 512, false); }finally { IOUtils.closeStream(in); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
That's all for the topic Java Program to Read a File From HDFS. If something is missing or you have something to share about the topic please write a comment.
You may also like
- Java Program to Write a File in HDFS
- HDFS Data Flow – File Read And Write in HDFS
- How MapReduce Works in Hadoop
- Java throws Clause With Examples
- Can we Start a Thread Twice in Java
- do-while loop in Java With Examples
- Spring Boot Stand Alone (non web) Application Example
- React App Flow - create-react-app Structure
No comments:
Post a Comment