seniorHadoop

What is Hadoop small file optimization strategies?

Updated May 16, 2026

Short answer

Small file problem is solved by combining files and using optimized storage formats.

Deep explanation

HDFS performs poorly with many small files due to metadata overhead on NameNode. Solutions include SequenceFile, HAR files, CombineFileInputFormat, and modern formats like Parquet or ORC which bundle small records efficiently.

Real-world example

IoT systems generating millions of small sensor logs.

Common mistakes

  • Storing each event as a separate file in HDFS.

Follow-up questions

  • Why is NameNode affected?
  • Best modern solution?

More Hadoop interview questions

View all →