What is the Spark Driver and what are its responsibilities?

Updated May 5, 2026

Short answer

The Driver is the central process that runs the main() function and orchestrates the Spark job.

Deep explanation

It converts user code into tasks, schedules them on executors, and maintains the SparkContext/SparkSession. It also tracks the lineage of RDDs and collects results from actions.

Real-world example

In a Jupyter notebook, the cell where you run your code acts as the driver environment.

Common mistakes

  • Running 'collect()' on a massive dataset, which pulls all data into the Driver's memory and causes OOM.

Follow-up questions

  • What is an Executor?
  • What is the Cluster Manager?

More Apache Spark interview questions

View all →