Uber Task in YARN

While running a MapReduce job in Hadoop YARN you may have noticed the following line displayed on the console.

Job job_1520505776000_0002 running in uber mode : false

In this post we’ll see what is this uber mode in Hadoop and how you can run any job as a uber task in YARN.

Overview of uber mode in Hadoop

Normally when a MapReduce job is submitted to run on a Hadoop cluster, ApplicationMaster has to get the number of Map and Reduce tasks that has to be executed for a job and negotiate with the ResourceManager to get that many resource containers for running the tasks.

If a job is small ApplicationMaster may decide to run the job sequentially in the similar JVM where ApplicationMaster itself is running. This way of running a job is known as uber task in YARN.

When to run as uber task

When ApplicationMaster can calculate that the overhead of negotiating resources with ResourceManager, communicating with NodeManagers on different nodes to launch the containers and running the tasks on those containers is much more than running MapReduce job sequentially, it can run a job as uber task.
Now the question is what helps ApplicationMaster to decide when it is more beneficial to run job sequentially rather than in parallel. There are configuration parameters for that to decide that the submitted job is “sufficiently small“.

Configuration parameters for uber task

Following configurations parameters are required for uber task in YARN. These parameters are in mapred-site.xml.

  • mapreduce.job.ubertask.enable – Setting this parameter as true enables the small-jobs “ubertask” optimization, which runs “sufficiently small” jobs sequentially within a single JVM. Default is false.
  • mapreduce.job.ubertask.maxmaps– Threshold for number of maps, beyond which job is considered too big for the ubertasking optimization. Default value is 9. Users may override this value, but only downward.
  • mapreduce.job.ubertask.maxreduces– Threshold for number of reduces, beyond which job is considered too big for the ubertasking optimization. CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values. Default value is 1. Users may override this value, but only downward.
  • mapreduce.job.ubertask.maxbytes– Threshold for number of input bytes, beyond which job is considered too big for the uber tasking optimization. If no value is specified, dfs.block.size is used as a default which means HDFS block size in case of HDFS.

That’s all for the topic Uber Task in YARN. If something is missing or you have something to share about the topic please write a comment.


You may also like

One Comment

  1. Pingback: What is Data Locality in Hadoop - KnpCode

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.