Latency Heatmap by Region

Compare model response speeds (median TTFT in ms) across US East, US West, EU West, and Asia Pacific.

Model	Provider	US East	US West	EU West	Asia Pacific
GPT-4o	OpenAI	320 ms	410 ms	520 ms	780 ms
GPT-4o mini	OpenAI	180 ms	230 ms	310 ms	490 ms
Claude 3.7 Sonnet	Anthropic	290 ms	380 ms	460 ms	710 ms
Claude 3.5 Haiku	Anthropic	160 ms	210 ms	290 ms	440 ms
Gemini 2.0 Flash	Google	210 ms	190 ms	380 ms	320 ms
Gemini 1.5 Pro	Google	380 ms	360 ms	490 ms	410 ms
Llama 3.3 70B	Meta/Together	260 ms	290 ms	440 ms	620 ms
Mistral Large	Mistral	340 ms	420 ms	280 ms	590 ms

Legend:

≤ 250 msFast

251–450 msModerate

> 450 msSlow

Measurement Notes

TTFT (Time to First Token) — All values represent median TTFT in milliseconds, measured across 50 samples per region per model. TTFT is the elapsed time from request dispatch to receipt of the first streamed token byte.
Infrastructure — Tests were run from AWS EC2 instances in us-east-1, us-west-2, eu-west-1, and ap-southeast-1 to simulate real cloud-hosted application latency. No VPN or proxy was used.
Period — Data collected Q1 2026 during business hours (09:00–17:00 local time) under typical API load. Results may vary under peak traffic or with provider infrastructure changes.
Llama 3.3 70B values reflect Together AI hosted inference endpoints. Self-hosted deployments may differ significantly depending on hardware and geography.