seniorHadoop
What is Hadoop compression and how does it improve performance?
Updated May 16, 2026
Short answer
Compression reduces data size to improve storage efficiency and reduce I/O.
Deep explanation
Hadoop supports codecs like Snappy, Gzip, and LZO. Compression reduces disk storage and network transfer but increases CPU usage. It is commonly applied to MapReduce intermediate data and output files.
Real-world example
Compressing log data before storing in HDFS to reduce storage cost.
Common mistakes
- Using high-compression codecs for CPU-heavy workloads.
Follow-up questions
- Which codec is fastest?
- Where should compression be applied?