What is the role of reward normalization in stabilizing deep Q-networks?

Updated May 17, 2026

Short answer

Reward normalization stabilizes training by controlling the scale of Q-value targets and gradients.

Deep explanation

In deep Q-learning, reward scale directly influences Q-value magnitude. Large or inconsistent rewards can lead to exploding Q-values and unstable gradients, while very small rewards can cause slow learning and vanishing gradients. Normalizing rewards ensures consistent magnitude across environments, improving optimization stability and reducing sensitivity to hyperparameters like learning rate and discount factor.

Unlock with a Pro subscription to view this section.

View pricing