midQ-Learning
What is overestimation bias in Q-Learning?
Updated May 17, 2026
Short answer
Overestimation bias occurs when Q-values are systematically overestimated.
Deep explanation
Using max operator in noisy estimates causes inflated Q-values, which can mislead learning.
Real-world example
Can lead to suboptimal policies in robotics navigation.
Common mistakes
- Ignoring bias in max action selection.
Follow-up questions
- How to reduce bias?
- Why does max cause bias?