If you want to compress output of the MapReduce job in Hadoop that can be done per-job basis by setting properties in your job configuration or at a whole cluster level by setting the properties in mapred-site.xml.
Properties for compressing MapReduce job output
- mapreduce.output.fileoutputformat.compress-Set to true if job outputs should be compressed. Default is false.
- mapreduce.output.fileoutputformat.compress.type- If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. Default is RECORD.
- mapreduce.output.fileoutputformat.compress.codec- If the job outputs are compressed, which codec is to be used. Default is org.apache.hadoop.io.compress.DefaultCodec
Making changes in mapred-site.xml
If you want to compress the MapReduce job output for all the jobs running on a cluster then you can add these properties in mapred-site.xml.
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>RECORD</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Making changes in Job configuration
If you want to compress output of the MapReduce job only for a specific MapReduce job then add properties in you job configuration.
FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
If you are using Sequence file format then you can set compression type too.
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
That's all for the topic How to Compress MapReduce Job Output. If something is missing or you have something to share about the topic please write a comment.
You may also like
- How to Compress Map Phase Output in Hadoop MapReduce
- Predefined Mapper and Reducer Classes in Hadoop
- throw Vs throws in Java Exception Handling
- Can we Start a Thread Twice in Java
- ArrayBlockingQueue in Java With Examples
- Radix Sort Java Program
- Spring Boot + Data JPA + Oracle One to Many Example
- Advantages and Disadvantages of Autowiring in Spring
No comments:
Post a Comment