We Compare AI

Alignment

Safety
Simple Definition

The challenge of ensuring AI systems behave in accordance with human values, intentions, and societal well-being.

Full Explanation

AI alignment is one of the central problems in AI safety. A misaligned AI pursues goals that differ from its creators' intentions — either through specification errors (we gave it the wrong goal), capability overhang (it's more capable than we realized), or mesa-optimization (it develops its own sub-goals during training). RLHF, Constitutional AI, and scalable oversight are current alignment techniques.

Last verified: 2026-03-30← Back to Glossary