What is the effect of large discount factors in long-horizon unstable environments?

Updated May 17, 2026

Short answer

Large discount factors increase long-term planning ability but can introduce instability and high variance.

Deep explanation

A discount factor close to 1 emphasizes long-term rewards, which is essential for long-horizon tasks. However, it increases the variance of Q-targets because future rewards accumulate uncertainty. This makes bootstrapped updates more unstable and can slow convergence. In extreme cases, γ=1 in continuing tasks may prevent convergence entirely.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Q-Learning interview questions

View all →