Explain 'Speculative Execution' in Spark.

Updated May 5, 2026

Short answer

It is a mechanism to launch redundant copies of slow-running tasks (stragglers) on other nodes.

Deep explanation

If Spark detects a task is running significantly slower than the median, it starts another instance. Whichever finishing first wins, and the other is killed.

Real-world example

A job is stuck at 99% because one hardware node is dying; speculation finishes the task elsewhere.

Common mistakes

  • Enabling speculation for jobs that write to non-idempotent sinks (might cause duplicate data).

Follow-up questions

  • Can it help with data skew?

More Apache Spark interview questions

View all →