Explain Fault Tolerance in Spark Streaming.

Updated May 5, 2026

Short answer

Spark Streaming uses Checkpointing and Write-Ahead Logs (WAL) to ensure fault tolerance.

Deep explanation

Checkpointing saves the state and metadata to reliable storage (HDFS/S3). WAL records incoming data to a log before processing.

Real-world example

A 24/7 financial monitor that must not lose a single event if a server reboots.

Common mistakes

  • Not enabling WAL when using S3 as a source, leading to data loss on failure.

Follow-up questions

  • What is Structured Streaming?

More Apache Spark interview questions

View all →