Predefined Mapper and Reducer Classes in Hadoop

With in Hadoop framework there are some pre-defined Mapper and Reducer classes which can be used as is in the required scenarios. That way you are not required to write mapper or reducer for those scenarios, you can use ready made classes instead.

Let’s see some of the predefined Mapper and Reducer classes in Hadoop.

Predefined Mapper classes in Hadoop

InverseMapper – This predefined mapper swaps keys and values. So the input (key, value) pair is reversed and the key becomes value and value becomes key in the output (key, value) pair.

TokenCounterMapper – This mapper tokenizes the input values and emit each word with a count of 1. So the mapper you write in case of word count MapReduce program can be replaced by this inbuilt mapper. See an example word count program using TokenCounterMapper and IntSumReducer.

MultithreadedMapper – This is the multi-threaded implementation of Mapper. Mapper implementations using this MapRunnable must be thread-safe.

ChainMapper– The ChainMapper class allows to use multiple Mapper classes within a single Map task. The Mapper classes are invoked in a chained fashion, the output of the first mapper becomes the input of the second, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.

FieldSelectionMapper– This class implements a mapper class that can be used to perform field selections in a manner similar to unix cut. The input data is treated as fields separated by a user specified separator. The user can specify a list of fields that form the map output keys, and a list of fields that form the map output values.
See an example using FieldSelectionMapper later.

RegexMapper– This predefined Mapper class in Hadoop extracts text from input that matches a regular expression.

Predefined Reducer classes in Hadoop

IntSumReducer– This predefinder Reducer class will sum the integer values associated with the specific key.

LongSumReducer– This predefinder Reducer class will sum the long values associated with the specific key.

FieldSelectionReducer– This class implements a reducer class that can be used to perform field selections in a manner similar to unix cut. The input data is treated as fields separated by a user specified separator. The user can specify a list of fields that form the reduce output keys, and a list of fields that form the reduce output values. The fields are the union of those from the key and those from the value.

ChainReducer– The ChainReducer class allows to chain multiple Mapper classes after a Reducer within the Reducer task. For each record output by the Reducer, the Mapper classes are invoked in a chained fashion. The output of the reducer becomes the input of the first mapper and output of first becomes the input of the second, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.

WrappedReducer– A Reducer which wraps a given one to allow for custom Reducer.Context implementations. This Reducer is useful if you want provide implementation of Context interface.

Examples using predefined Mapper and Reducer classes

Here are some examples using predefined Mapper and Reducer classes.

Using FieldSelection Mapper

In the example there is tab separated input data and you want to extract field 0 as key and field 1 as value. In this scenario you can use FieldSelectionMapper rather than writing your own mapper.

Using TokenCounterMapper and IntSumReducer to write a word count MapReduce program

That’s all for the topic Predefined Mapper and Reducer Classes in Hadoop. If something is missing or you have something to share about the topic please write a comment.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.