midQ-Learning
What is SARSA algorithm?
Updated May 17, 2026
Short answer
SARSA is an on-policy Q-learning variant.
Deep explanation
It updates Q-values based on the action actually taken by current policy, not the optimal one.
Real-world example
Used in safer exploration scenarios like robotics.
Common mistakes
- Confusing SARSA with off-policy Q-learning.
Follow-up questions
- Why is SARSA safer?
- What is on-policy learning?