midApache Spark
What are Accumulators and Broadcast Variables?
Updated May 5, 2026
Short answer
Broadcast variables are read-only shared variables; Accumulators are write-only variables used for counters/sums.
Deep explanation
Broadcast variables allow the driver to efficiently distribute large data to all tasks once. Accumulators allow tasks to 'add' to a variable on the driver (e.g., counting bad records).
Real-world example
Broadcasting a lookup table; using an accumulator to count how many records failed parsing in a file.
Common mistakes
- Reading an accumulator value inside a task (only the driver can read it).
Follow-up questions
- What happens to an accumulator during a task retry?