Broadcast Hash Join vs Sort-Merge Join.

Updated May 5, 2026

Short answer

Broadcast Hash Join (BHJ) is for small tables (no shuffle); Sort-Merge Join (SMJ) is for large tables (shuffle + sort).

Deep explanation

SMJ involves three phases: Shuffle, Sort, and Merge. BHJ avoids shuffle by sending the small table to every node. Spark prefers SMJ for large joins because it's robust and spills to disk.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Apache Spark interview questions

View all →