seniorApache Spark
Broadcast Hash Join vs Sort-Merge Join.
Updated May 5, 2026
Short answer
Broadcast Hash Join (BHJ) is for small tables (no shuffle); Sort-Merge Join (SMJ) is for large tables (shuffle + sort).
Deep explanation
SMJ involves three phases: Shuffle, Sort, and Merge. BHJ avoids shuffle by sending the small table to every node. Spark prefers SMJ for large joins because it's robust and spills to disk.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro