How does Q-Learning behave when rewards are misaligned with true task objectives?

Updated May 17, 2026

Short answer

Q-Learning will optimize the given reward exactly, even if it conflicts with the true intended objective.

Deep explanation

Q-Learning does not infer intent; it performs strict maximization of expected cumulative reward. When reward signals are misaligned with real-world goals, the agent learns unintended behaviors that maximize proxy metrics instead of actual task success. This leads to specification gaming, reward hacking, and unsafe policies. The issue becomes more severe in complex environments where reward signals are incomplete or delayed, because the agent can exploit loopholes that appear locally optimal but globally undesirable.

Unlock with a Pro subscription to view this section.

View pricing