Latency Heatmap by Region
Compare model response speeds (median TTFT in ms) across US East, US West, EU West, and Asia Pacific.
| Model | Provider | US East | US West | EU West | Asia Pacific |
|---|---|---|---|---|---|
| GPT-4o | OpenAI | 320 ms | 410 ms | 520 ms | 780 ms |
| GPT-4o mini | OpenAI | 180 ms | 230 ms | 310 ms | 490 ms |
| Claude 3.7 Sonnet | Anthropic | 290 ms | 380 ms | 460 ms | 710 ms |
| Claude 3.5 Haiku | Anthropic | 160 ms | 210 ms | 290 ms | 440 ms |
| Gemini 2.0 Flash | 210 ms | 190 ms | 380 ms | 320 ms | |
| Gemini 1.5 Pro | 380 ms | 360 ms | 490 ms | 410 ms | |
| Llama 3.3 70B | Meta/Together | 260 ms | 290 ms | 440 ms | 620 ms |
| Mistral Large | Mistral | 340 ms | 420 ms | 280 ms | 590 ms |
Legend:
≤ 250 msFast
251–450 msModerate
> 450 msSlow
Measurement Notes
- TTFT (Time to First Token) — All values represent median TTFT in milliseconds, measured across 50 samples per region per model. TTFT is the elapsed time from request dispatch to receipt of the first streamed token byte.
- Infrastructure — Tests were run from AWS EC2 instances in us-east-1, us-west-2, eu-west-1, and ap-southeast-1 to simulate real cloud-hosted application latency. No VPN or proxy was used.
- Period — Data collected Q1 2026 during business hours (09:00–17:00 local time) under typical API load. Results may vary under peak traffic or with provider infrastructure changes.
- Llama 3.3 70B values reflect Together AI hosted inference endpoints. Self-hosted deployments may differ significantly depending on hardware and geography.