What is contrastive feature learning collapse and how is it prevented?

Updated May 15, 2026

Short answer

Collapse occurs when all embeddings become identical; contrastive methods prevent it using negative samples or stop-gradient tricks.

Deep explanation

In self-supervised learning, models can trivially minimize loss by mapping all inputs to the same vector. Contrastive learning avoids this by pushing apart negative samples. Methods like SimCLR use large batch negatives, MoCo uses memory banks, and BYOL avoids collapse using stop-gradient and asymmetric networks without negatives.

Unlock with a Pro subscription to view this section.

View pricing