What is the Bellman equation in Q-Learning?

Updated May 17, 2026

Short answer

It defines how Q-values are updated using immediate reward and future rewards.

Q(s,a) = reward + gamma * max Q(s', a'). It recursively defines optimal value functions.

Used in robotics path planning for optimizing movement decisions.