juniorQ-Learning
What is the Bellman equation in Q-Learning?
Updated May 17, 2026
Short answer
It defines how Q-values are updated using immediate reward and future rewards.
Deep explanation
Q(s,a) = reward + gamma * max Q(s', a'). It recursively defines optimal value functions.
Real-world example
Used in robotics path planning for optimizing movement decisions.
Common mistakes
- Ignoring discount factor gamma in updates.
Follow-up questions
- Why do we use max Q(s',a')?