We Compare AI

Latency

Performance
Simple Definition

The time delay between sending a request to an AI API and receiving the first token of response.

Full Explanation

Latency matters enormously for user experience. For chat interfaces, time-to-first-token (TTFT) should be under 1 second. For real-time voice AI, latency must be under 300ms. For batch processing, throughput matters more than latency. Different models have very different latency profiles — Claude Haiku is much faster than Claude Opus at the cost of capability.

Last verified: 2026-03-30← Back to Glossary