seniorNLP
What is activation checkpointing vs gradient checkpointing?
Updated May 17, 2026
Short answer
Activation checkpointing stores minimal forward activations; gradient checkpointing recomputes them during backprop.
Deep explanation
Both techniques reduce memory usage by trading compute for storage. Activation checkpointing stores selected activations, while gradient checkpointing recomputes intermediate values during backward pass. These are essential for training large transformers on limited GPU memory.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro