What is the effect of large discount factors in long-horizon unstable environments?
Updated May 17, 2026
Short answer
Large discount factors increase long-term planning ability but can introduce instability and high variance.
Deep explanation
A discount factor close to 1 emphasizes long-term rewards, which is essential for long-horizon tasks. However, it increases the variance of Q-targets because future rewards accumulate uncertainty. This makes bootstrapped updates more unstable and can slow convergence. In extreme cases, γ=1 in continuing tasks may prevent convergence entirely.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro