What is neural tangent kernel (NTK) in relation to Gradient Descent?

Updated May 16, 2026

Short answer

NTK describes how infinitely wide neural networks trained with Gradient Descent behave like kernel methods.

Deep explanation

The Neural Tangent Kernel (NTK) shows that when neural networks become infinitely wide, their training dynamics under Gradient Descent become linearized around initialization. In this regime, the network behaves like a kernel machine with a fixed kernel defined by architecture. This explains why very wide networks train predictably and converge smoothly.

Real-world example

Explaining why very large transformer models train stably at scale.

Common mistakes

Assuming NTK applies to all neural networks regardless of width.

Follow-up questions

What does infinite width mean?
Why is NTK useful?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions