What is bootstrapping instability in Q-Learning?

Updated May 17, 2026

Short answer

Bootstrapping instability occurs when Q-updates depend on other estimates that are themselves incorrect.

Deep explanation

Q-Learning updates use existing Q-values to estimate future rewards. If these estimates are inaccurate early in training, errors propagate recursively. This feedback loop can cause divergence, especially when combined with function approximation and off-policy learning (the deadly triad). Stabilization techniques like target networks and delayed updates mitigate this.

Unlock with a Pro subscription to view this section.

View pricing