We Compare AI

Constitutional AI

Safety
Simple Definition

Anthropic's technique for training Claude to be helpful, harmless, and honest by having AI models critique and revise their own outputs based on a set of principles.

Full Explanation

Constitutional AI (CAI) involves two phases: 1) Supervised learning phase where the model revises its responses to comply with a 'constitution' of principles, 2) RL phase where a preference model trained on AI feedback (not human feedback) guides further training. This allows scaling alignment supervision with less human labeling. It's how Claude was made safer than models trained with pure RLHF.

Last verified: 2026-03-30← Back to Glossary