seniorDeep Learning
What is Reinforcement Learning from Human Feedback (RLHF)?
Updated May 16, 2026
Short answer
RLHF aligns AI model behavior with human preferences by combining reinforcement learning with human-generated feedback signals.
Deep explanation
Large language models trained only through next-token prediction often produce outputs that are unsafe, irrelevant, misleading, or misaligned with human expectations.
RLHF improves alignment.
Pipeline:
- Pretraining:
- Train language model using self-supervised learning.
- Supervised Fine-Tuning (SFT):
- Train on curated instruction-response datasets.
- Human Preference Collection:
- Humans rank multiple model outputs.
- Reward Model Training:
- Train a reward model predicting human preferences.
5.…
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro