This post shows ways to fix missing or corrupt blocks in HDFS and under replicated blocks in HDFS.
How to find out corrupt blocks
To list the corrupt blocks in the HDFS you can use the following command.
$ hdfs fsck -list-corruptfileblocks
This command will give you the list of missing blocks and the file name to which those blocks belong. You can also use hdfs fsck / to get information about the file system including corrupt blocks and under replicated blocks.
How to fix corrupt or missing blocks error
There is no easy way to fix the corrupt block error. If you can copy the same file again then that is the best thing to do. For that you can delete the files with corrupt blocks and then copy the file again.
To delete the files having corrupt blocks use the following command.
$ hdfs fsck / -delete
If you don’t want to delete the file and want to do some inspection of the nodes where the blocks are stored then you can get information about the nodes using the following procedure.
Using the name of the file you got by running this command hdfs fsck -list-corruptfileblocks
run the following command to get the DataNode info.
$ hdfs fsck /path/to/corrupt/file -locations -blocks -files
Then you can inspect the node for any network or hardware related problems.
How to fix under replicated blocks
To get the list of under replicated blocks in Hadoop you can run the following command.
$ hdfs fsck /
It will give you the name of the file, block and the expected/found replicated count. Hadoop framework should replicate the under replicated blocks automatically but you can also write a script to set the replication to the desired number.
Since the output of hdfs fsck / is in the following form-
/tmp/hadoop-yarn/staging/knpcode/.staging/job_1520752279140_0001/job.split: Under replicated BP-1309973318-127.0.1.1-1513945999329:blk_1073741921_1097. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
So you can use the following script to get the file names in which blocks are replicated and store those files in the temp file. Then iterate that temp file and use -setrep command to set the replication to desired number.
$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/files $ for underrepfile in `cat /tmp/files`; do echo "Setting replication for $underrepfile"; hdfs dfs -setrep 3 $underrepfile; done
That's all for the topic How to Fix Corrupt Blocks And Under Replicated Blocks in HDFS. If something is missing or you have something to share about the topic please write a comment.
You may also like
- HDFS Replica Placement Policy
- How to Improve Map-Reduce Performance in Hadoop
- Java Program to Compress File in gzip Format in Hadoop
- How to See Logs And Sysouts in Hadoop MapReduce
- Java Finally Block - Exception Handling
- ThreadLocal Class in Java With Examples
- Marker Interface in Java
- Injecting Prototype Bean into a Singleton Bean in Spring
No comments:
Post a Comment