While running a MapReduce job in Hadoop YARN you may have noticed the following line displayed on the console.
Job job_1520505776000_0002 running in uber mode : false
In this post we’ll see what is this uber mode in Hadoop and how you can run any job as a uber task in YARN.
Overview of uber mode in Hadoop
Normally when a MapReduce job is submitted to run on a Hadoop cluster, ApplicationMaster has to get the number of Map and Reduce tasks that has to be executed for a job and negotiate with the ResourceManager to get that many resource containers for running the tasks.
If a job is small ApplicationMaster may decide to run the job sequentially in the similar JVM where ApplicationMaster itself is running. This way of running a job is known as uber task in YARN.
When to run as uber task
When ApplicationMaster can calculate that the overhead of negotiating resources with ResourceManager, communicating with NodeManagers on different nodes to launch the containers and running the tasks on those containers is much more than running MapReduce job sequentially, it can run a job in uber mode.
Now the question is what helps ApplicationMaster to decide when it is more beneficial to run job sequentially rather than in parallel. There are configuration parameters for that to decide that the submitted job is “sufficiently small“.
Configuration parameters for uber task
Following configurations parameters are required for uber task in YARN. These parameters are in mapred-site.xml.
- mapreduce.job.ubertask.enable – Setting this parameter as true enables the small-jobs “ubertask” optimization, which runs “sufficiently small” jobs sequentially within a single JVM. Default is false.
- mapreduce.job.ubertask.maxmaps– Threshold for number of maps, beyond which job is considered too big for the ubertasking optimization. Default value is 9. Users may override this value, but only downward.
- mapreduce.job.ubertask.maxreduces– Threshold for number of reduces, beyond which job is considered too big for the ubertasking optimization. CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values. Default value is 1. Users may override this value, but only downward.
- mapreduce.job.ubertask.maxbytes– Threshold for number of input bytes, beyond which job is considered too big for the uber tasking optimization. If no value is specified, dfs.block.size is used as a default which means HDFS block size in case of HDFS.
- Capacity Scheduler in Yarn
- YARN Fair Scheduler With Example
- HDFS Replica Placement Policy
- HDFS Federation
- Java Program to Write a File in HDFS
- Distributed Cache in Hadoop
- GenericOptionsParser And ToolRunner in Hadoop
- How to Read And Write Avro Files in Hadoop
That’s all for the topic Uber Task in YARN. If something is missing or you have something to share about the topic please write a comment.
You may also like