seniorHadoop

What is Hadoop compression and how does it improve performance?

Updated May 16, 2026

Short answer

Compression reduces data size to improve storage efficiency and reduce I/O.

Deep explanation

Hadoop supports codecs like Snappy, Gzip, and LZO. Compression reduces disk storage and network transfer but increases CPU usage. It is commonly applied to MapReduce intermediate data and output files.

Real-world example

Compressing log data before storing in HDFS to reduce storage cost.

Common mistakes

  • Using high-compression codecs for CPU-heavy workloads.

Follow-up questions

  • Which codec is fastest?
  • Where should compression be applied?

More Hadoop interview questions

View all →