tacho - LLM Speed Test

Measure and compare AI model inference time in the CLI

$ tacho gpt-4.1 gemini/gemini-2.5-pro vertex_ai/claude-sonnet-4@20250514 gpt-4.1 vertex_ai/claude-sonnet-4@20250514 gemini/gemini-2.5-pro ┌────────────────────────────────────┬───────────┬───────────┬───────────┬──────────┐ │ Model │ Avg tok/s │ Min tok/s │ Max tok/s │ Avg Time │ ├────────────────────────────────────┼───────────┼───────────┼───────────┼──────────┤ │ gemini/gemini-2.5-pro │ 84.6 │ 50.3 │ 133.8 │ 13.44s │ │ gpt-4.1 │ 49.7 │ 35.1 │ 66.6 │ 10.75s │ │ vertex_ai/claude-sonnet-4@20250514 │ 48.7 │ 47.3 │ 50.9 │ 10.27s │ └────────────────────────────────────┴───────────┴───────────┴───────────┴──────────┘

Quick Start

Run tacho with "uv" without installation:

uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash

Or install globally:

uv tool install tacho

Features

⚡ Parallel Testing

Concurrent calls for faster results

📊 Token Metrics

Measures actual tok/s, not just response time

🔌 Multi-Provider

Works with all providers supported by LiteLLM

🎯 Zero Config

Just set your API keys and run

🔒 100% Private

No telemetry or data sent to our servers

🧠 Reasoning Support

Accurately takes into account thinking tokens

Usage

Set your API keys:

export OPENAI_API_KEY=<your-key> export GEMINI_API_KEY=<your-key>

Run benchmarks with custom settings:

tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --tokens 1000