seniorHadoop
What is Hadoop input split and how does it differ from HDFS block?
Updated May 16, 2026
Short answer
Input split is a logical division for MapReduce processing, while HDFS block is a physical storage unit.
Deep explanation
HDFS blocks define how data is physically stored, whereas input splits define how data is logically processed by mappers. A split may span multiple blocks or be smaller than a block. The InputFormat determines how splits are created.
Real-world example
Processing a 1GB file stored in blocks but split into multiple map tasks.
Common mistakes
- Confusing storage layout with processing layout.
Follow-up questions
- Which controls number of mappers?
- Can a split cross block boundaries?