What is SARSA algorithm?

Updated May 17, 2026

Short answer

SARSA is an on-policy Q-learning variant.

Deep explanation

It updates Q-values based on the action actually taken by current policy, not the optimal one.

Real-world example

Used in safer exploration scenarios like robotics.

Common mistakes

Confusing SARSA with off-policy Q-learning.

Follow-up questions

Why is SARSA safer?
What is on-policy learning?

More Q-Learning interview questions

How does Q-Learning handle exploration-exploitation under uncertainty in large state spaces?senior
What is the relationship between Q-Learning and fixed-point convergence?senior
How does Q-Learning behave when reward signals are delayed and noisy simultaneously?senior
What is the impact of state representation quality on Q-Learning convergence?senior
How does Q-Learning handle catastrophic bootstrapping errors?senior
What is the role of reward normalization in stabilizing deep Q-networks?senior
How does Q-Learning behave under function approximation + off-policy mismatch?senior
How does Q-Learning interact with non-convex function approximation landscapes?senior