seniorHadoop

What is Hadoop locality-aware scheduling in YARN?

Updated May 16, 2026

Short answer

It schedules tasks on nodes where data resides to minimize network usage.

Deep explanation

YARN prioritizes node-local execution, then rack-local, then off-rack if necessary. This reduces network I/O and improves performance significantly in large clusters.

Real-world example

Processing large logs on servers where they are stored.

Common mistakes

  • Ignoring locality leading to network congestion.

Follow-up questions

  • What is locality delay?
  • Why off-rack execution is slower?

More Hadoop interview questions

View all →