Word Count MapReduce Program in Hadoop

Once you have installed Hadoop on your system and initial verification is done you would be looking to write your first MapReduce program. Before digging deeper into the intricacies of MapReduce programming first step is the word count MapReduce program in Hadoop which is also known as the “Hello World” of the Hadoop framework.

So here is the simple word count MapReduce program written in Java to get you started with MapReduce programming.

What you need

  1. It will be good if you have any IDE like Eclipse to write the Java code.
  2. A text file which is your input file. It should be copied to HDFS. This is the file which Map task will process and produce output in (key, value) pairs. This Map task output becomes input for the Reduce task.

Process

These are the steps you need for executing your Word count MapReduce program.

  1. Start daemons by executing the start-dfs and start-yarn scripts.
  2. Create an input directory in HDFS where you will keep your text file.
  3. Copy the text file you created to /usr/input directory.

    I have created a text file called count with the following content

    If you want to verify that the file is copied or not, you can run the following command –

Word count MapReduce Java code

You will need at least the given jars to compile your MapReduce code, you will find them in the share directory of your Hadoop installation.

Word count MapReduce program jars

Running the word count MapReduce program

Once your code is successfully compiled, create a jar. If you are using eclipse IDE you can use it to create the jar by Right clicking on project – export – Java (Jar File)

Once jar is created you need to run the following command to execute your MapReduce code.

In the above command

/home/knpcode/Documents/knpcode/Hadoop/wordcount.jar is the path to your jar.

org.knpcode.WordCount is the fully qualified name of Java class that you need to run.

/user/input is the path to input file.

/user/output is the path to output

In the java program in the main method there were these two lines –

That’s where input and output directories will be set.

To see an explanation of word count MapReduce program working in detail, check this post- How MapReduce Works in Hadoop

After execution you can check the output directory for the output.

The output can be verified by listing the content of the created output file.

That’s all for the topic Word Count MapReduce Program in Hadoop. If something is missing or you have something to share about the topic please write a comment.


You may also like

4 Comments

  1. Pingback: Installing Hadoop in Pseudo-distributed mode – Technical Tutorials

  2. Pingback: How MapReduce Works in Hadoop – Technical Tutorials

  3. Pingback: Fair Scheduler in Yarn - KnpCode

  4. Pingback: How to Compress MapReduce Job Output - KnpCode

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.