juniorApache Spark
What is the Catalyst Optimizer?
Updated May 5, 2026
Short answer
Catalyst is Spark's query optimizer for DataFrames and SQL.
Deep explanation
It performs rule-based and cost-based optimizations such as predicate pushdown, constant folding, and column pruning.
Real-world example
Running a SQL query on Parquet files; Catalyst ensures only required columns are read from disk.
Common mistakes
- Thinking RDDs use Catalyst
- only DataFrames/Datasets do.
Follow-up questions
- What is Tungsten?