tacho - LLM Speed Test
Measure and compare AI model inference time in the CLI
$ tacho gpt-4.1-nano gemini/gemini-2.0-flash
✓ gemini/gemini-2.0-flash
✓ gpt-4.1-nano
┌─────────────────────────┬───────────┬───────────┬───────────┬──────────┐
│ Model │ Avg tok/s │ Min tok/s │ Max tok/s │ Avg Time │
├─────────────────────────┼───────────┼───────────┼───────────┼──────────┤
│ gemini/gemini-2.0-flash │ 124.0 │ 110.5 │ 136.6 │ 4.0s │
│ gpt-4.1-nano │ 116.9 │ 105.4 │ 129.5 │ 4.3s │
└─────────────────────────┴───────────┴───────────┴───────────┴──────────┘
Quick Start
Run tacho with "uv" without installation:
uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash
Or install globally:
uv tool install tacho
Features
⚡ Parallel Testing
Concurrent calls for faster results
📊 Token Metrics
Measures actual tok/s, not just response time
🔌 Multi-Provider
Works with all providers supported by LiteLLM
🎯 Zero Config
Just set your API keys and run
🔒 100% Private
No telemetry or data sent to our servers
🏓 Ping Models
Quickly verify API keys and model access
Usage
Set your API keys:
export OPENAI_API_KEY=<your-key>
export GEMINI_API_KEY=<your-key>
Run benchmarks with custom settings:
tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --lim 1000