What is overestimation bias in Q-Learning?

Updated May 17, 2026

Short answer

Overestimation bias occurs when Q-values are systematically overestimated.

Deep explanation

Using max operator in noisy estimates causes inflated Q-values, which can mislead learning.

Real-world example

Can lead to suboptimal policies in robotics navigation.

Common mistakes

  • Ignoring bias in max action selection.

Follow-up questions

  • How to reduce bias?
  • Why does max cause bias?

More Q-Learning interview questions

View all →