Mapper Only Job in Hadoop MapReduce

Generally when we think of MapReduce job in Hadoop we think of both mappers and reducers doing their share of processing. That is true for most of the cases but you can have scenarios where you want to have a mapper only job in Hadoop.

When do you need map only job

You may opt for a map only job when you do need to process the data and get the output as (key, value) pairs but don’t want to aggregate those (key, value) pairs.
For example – If you are converting a text file to sequence file using MapReduce. In this case you just want to read a line from text file and write it to a sequence file so you can opt for a MapReduce with only map method.
Same way if you are converting a text file to parquet file using MapReduce you can opt for a mapper only job in Hadoop.

What you need to do for mapper only job

Only map method is written in the code which will do the processing and the number of reducers is set to zero.
In order to set the number of reducers to zero you can use the setNumReduceTasks() method of the Job class. So you need to add the following in your job configuration in your MapReduce code driver.

Benefits of Mapper only job

As already stated if you just want to process the data with out any aggregation then better go for a mapper only job as you can save on some of the processing done internally by the Hadoop framework.

Since reducer is not there so no need of shuffle and sort phase also transferring of data to the nodes where reducers are running is not required.

Also note that in a MapReduce job the output of map phase is written to the local disk on the node rather than to HDFS. Where as, in the case of Mapper only job, Map output is written to the HDFS.

Mapper only job in Hadoop example

If you have to convert a text file to sequence file you can set the number of reducers to zero.

You can run the MapReduce job using the followng command.

As you can see by listing the output directory that a sequence file is created.

That’s all for the topic Mapper Only Job in Hadoop MapReduce. If something is missing or you have something to share about the topic please write a comment.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.