What is gradient accumulation and how does it affect cost computation?

Updated May 15, 2026

Short answer

Gradient accumulation simulates large batch training by summing gradients over multiple steps before updating weights.

Deep explanation

When memory constraints prevent large batch sizes, gradients are accumulated over multiple forward-backward passes. This effectively approximates the cost function gradient over a larger batch without increasing memory usage. However, it changes optimization dynamics slightly due to delayed updates and reduced noise.

Unlock with a Pro subscription to view this section.

View pricing