seniorGradient Descent
What is neural tangent kernel (NTK) in relation to Gradient Descent?
Updated May 16, 2026
Short answer
NTK describes how infinitely wide neural networks trained with Gradient Descent behave like kernel methods.
Deep explanation
The Neural Tangent Kernel (NTK) shows that when neural networks become infinitely wide, their training dynamics under Gradient Descent become linearized around initialization. In this regime, the network behaves like a kernel machine with a fixed kernel defined by architecture. This explains why very wide networks train predictably and converge smoothly.
Real-world example
Explaining why very large transformer models train stably at scale.
Common mistakes
- Assuming NTK applies to all neural networks regardless of width.
Follow-up questions
- What does infinite width mean?
- Why is NTK useful?