If you want to compress output of the MapReduce job in Hadoop that can be done per-job basis by setting properties in your job configuration or at a whole cluster level by setting the properties in mapred-site.xml.
Properties for compressing MapReduce job output
- mapreduce.output.fileoutputformat.compress-Set to true if job outputs should be compressed. Default is false.
- mapreduce.output.fileoutputformat.compress.type– If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. Default is RECORD.
- mapreduce.output.fileoutputformat.compress.codec– If the job outputs are compressed, which codec is to be used. Default is org.apache.hadoop.io.compress.DefaultCodec
Making changes in mapred-site.xml
If you want to compress the MapReduce job output for all the jobs running on a cluster then you can add these properties in mapred-site.xml.
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>RECORD</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Making changes in Job configuration
If you want to compress output of the MapReduce job only for a specific MapReduce job then add properties in you job configuration.
FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
If you are using Sequence file format then you can set compression type too.
- How to Compress Map Phase Output in Hadoop MapReduce
- Word Count Program Using MapReduce in Hadoop
- Predefined Mapper and Reducer Classes in Hadoop
- Counters in Hadoop MapReduce
- OutputCommitter in Hadoop MapReduce
- HDFS Data Flow – File Read And Write in HDFS
- What is Data Locality in Hadoop
- Java Program to Compress File in bzip2 Format in Hadoop
That’s all for the topic How to Compress MapReduce Job Output. If something is missing or you have something to share about the topic please write a comment.
You may also like