seniorHadoop

What is Hadoop input split and how does it differ from HDFS block?

Updated May 16, 2026

Short answer

Input split is a logical division for MapReduce processing, while HDFS block is a physical storage unit.

Deep explanation

HDFS blocks define how data is physically stored, whereas input splits define how data is logically processed by mappers. A split may span multiple blocks or be smaller than a block. The InputFormat determines how splits are created.

Real-world example

Processing a 1GB file stored in blocks but split into multiple map tasks.

Common mistakes

  • Confusing storage layout with processing layout.

Follow-up questions

  • Which controls number of mappers?
  • Can a split cross block boundaries?

More Hadoop interview questions

View all →