Distributed Cache in Hadoop

In this post we’ll see what Distributed cache in Hadoop is.

What is a distributed cache

As the name suggests distributed cache in Hadoop is a cache where you can store a file (text, archives, jars etc.) which is distributed across the nodes where mappers and reducers for the MapReduce job are running. That way the cached files are localized for the running map and reduce tasks.

Methods for adding the files in Distributed Cache

There is a DistributedCache class with relevant methods but the whole class is deprecated in Hadoop2. You should be using the methods in Job class instead.

  • public void addCacheFile(URI uri)– Add a file to be localized.
  • public void addCacheArchive(URI uri)– Add archives to be localized.
  • public void addFileToClassPath(Path file)– Adds file path to the current set of classpath entries. It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath.
  • public void addArchiveToClassPath(Path archive)– Adds an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.

How to use distributed cache

In order to make available a file through distributed cache in Hadoop.

1- Copy the file you want to make available through distributed cache to HDFS if it is not there already.
2- Based on the file type use the relevant method to add it to distributed cache.

As example if you want to add a text file to distributed cache then you can use the following statement in your driver.

If you want to add a jar to the class path then you can do it as follows-

Distributed cache example MapReduce code

Here is an Avro MapReduce word count example program. Output file is an Avro data file which uses an Avro schema. This Avro schema is added to the distributed cache using the addCacheFile() method and used by the mappers and reducers.

That’s all for the topic Distributed Cache in Hadoop. If something is missing or you have something to share about the topic please write a comment.


You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.