tacho - LLM Speed Test
Measure and compare AI model inference time in the CLI
$ tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514
✓ gpt-4.1
✓ vertex_ai/claude-sonnet-4@20250514
✓ gemini/gemini-2.5-pro
┌────────────────────────────────────┬───────────┬───────────┬───────────┬──────────┐
│ Model │ Avg tok/s │ Min tok/s │ Max tok/s │ Avg Time │
├────────────────────────────────────┼───────────┼───────────┼───────────┼──────────┤
│ gemini/gemini-2.5-pro │ 84.6 │ 50.3 │ 133.8 │ 13.44s │
│ gpt-4.1 │ 49.7 │ 35.1 │ 66.6 │ 10.75s │
│ vertex_ai/claude-sonnet-4@20250514 │ 48.7 │ 47.3 │ 50.9 │ 10.27s │
└────────────────────────────────────┴───────────┴───────────┴───────────┴──────────┘
Quick Start
Run tacho with "uv" without installation:
uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash
Or install globally:
uv tool install tacho
Features
⚡ Parallel Testing
Concurrent calls for faster results
📊 Token Metrics
Measures actual tok/s, not just response time
🔌 Multi-Provider
Works with all providers supported by LiteLLM
🎯 Zero Config
Just set your API keys and run
🔒 100% Private
No telemetry or data sent to our servers
🧠 Reasoning Support
Accurately takes into account thinking tokens
Usage
Set your API keys:
export OPENAI_API_KEY=<your-key>
export GEMINI_API_KEY=<your-key>
Run benchmarks with custom settings:
tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --tokens 1000