What is the Bellman equation in Q-Learning?

Updated May 17, 2026

Short answer

It defines how Q-values are updated using immediate reward and future rewards.

Deep explanation

Q(s,a) = reward + gamma * max Q(s', a'). It recursively defines optimal value functions.

Real-world example

Used in robotics path planning for optimizing movement decisions.

Common mistakes

  • Ignoring discount factor gamma in updates.

Follow-up questions

  • Why do we use max Q(s',a')?

More Q-Learning interview questions

View all →