What is the effect of large discount factors in long-horizon unstable environments?

Updated May 17, 2026

Short answer

Large discount factors increase long-term planning ability but can introduce instability and high variance.

Deep explanation

A discount factor close to 1 emphasizes long-term rewards, which is essential for long-horizon tasks. However, it increases the variance of Q-targets because future rewards accumulate uncertainty. This makes bootstrapped updates more unstable and can slow convergence. In extreme cases, γ=1 in continuing tasks may prevent convergence entirely.

Unlock with a Pro subscription to view this section.

View pricing