How the data storage on HDFS:
Individual storage unit on the Hadoop Distributed File System.
In Hadoop 1.X default block size is 64MB
In Hadoop 2.X default block size is 128MB
If any file request is coming to Hadoop cluster what are the steps:
Step 1: Hadoop Master node only receives the file request.
Step2: Based on the Blocksize configuration at that time, data will be divided into no.of blocks.
How to configure “Blocksize” in Hadoop?
/usr/local/hadoop/conf/hdfs-site.xml <configuration> <property> <name>dfs.block.size</name> <value>14323883></value> </property> </configuration>
How to store data in HDFS:
Assume that we have A.log, B.log, and C.log files:
A.log -> 200mb -> 200/64 -> 64mb 64mb 64mb 8mb+remaining
B.log->192mb->192/64-> 64mb 64mb 64mb
Design Rules of Blocksize:
1.Irrespective of the file size: In Blocksize for each and every file dedicated to no.of blocks will be there in Hadoop.
2.Except for the last block: Remaining all the blocks of a file will hold the equal volume of data.
Hadoop master node only looks at the block size at the time of blocking the data(dividing data). Not at the time of reading the data because at the time of reading the data only metadata matters.