tacho - LLM Speed Test

Measure and compare AI model inference time in the CLI

$ tacho gpt-4.1-nano gemini/gemini-2.0-flash gemini/gemini-2.0-flash gpt-4.1-nano ┌─────────────────────────┬───────────┬───────────┬───────────┬──────────┐ │ Model │ Avg tok/s │ Min tok/s │ Max tok/s │ Avg Time │ ├─────────────────────────┼───────────┼───────────┼───────────┼──────────┤ │ gemini/gemini-2.0-flash │ 124.0 │ 110.5 │ 136.6 │ 4.0s │ │ gpt-4.1-nano │ 116.9 │ 105.4 │ 129.5 │ 4.3s │ └─────────────────────────┴───────────┴───────────┴───────────┴──────────┘

Quick Start

Run tacho with "uv" without installation:

uvx tacho gpt-4.1-nano gemini/gemini-2.0-flash

Or install globally:

uv tool install tacho

Features

⚡ Parallel Testing

Concurrent calls for faster results

📊 Token Metrics

Measures actual tok/s, not just response time

🔌 Multi-Provider

Works with all providers supported by LiteLLM

🎯 Zero Config

Just set your API keys and run

🔒 100% Private

No telemetry or data sent to our servers

🏓 Ping Models

Quickly verify API keys and model access

Usage

Set your API keys:

export OPENAI_API_KEY=<your-key> export GEMINI_API_KEY=<your-key>

Run benchmarks with custom settings:

tacho gpt-4.1-nano claude-3.5-haiku --runs 3 --lim 1000