seniorHadoop
What is Hadoop locality-aware scheduling in YARN?
Updated May 16, 2026
Short answer
It schedules tasks on nodes where data resides to minimize network usage.
Deep explanation
YARN prioritizes node-local execution, then rack-local, then off-rack if necessary. This reduces network I/O and improves performance significantly in large clusters.
Real-world example
Processing large logs on servers where they are stored.
Common mistakes
- Ignoring locality leading to network congestion.
Follow-up questions
- What is locality delay?
- Why off-rack execution is slower?