MIT licensed · Apple Silicon
One harness, five backends — Ollama, LM Studio, MLX, llama.cpp, and OpenRouter. Real filesystem and shell tools, approval gates on every action, and parsing that catches tool calls even when a 7B forgets the format.
# profile: mac-mini-16gb · backend: ollama · model: qwen2.5-coder:7b small-harness › the auth tests are failing on main, can you find and fix it? › tool · shell cargo test auth approved › tool · read_file src/auth.rs approved › tool · apply_patch src/auth.rs (lines 41-48) approved › tool · shell cargo test auth approved Fixed. validate_token() was returning Ok(()) for expired tokens because the expiry check ran after the signature check short-circuited. Moved the expiry check first. All 14 auth tests pass. small-harness › /compare openai/gpt-4o-mini qwen2.5-coder:7b ✓ 4 tools · 11.2s · 52 tok/s gpt-4o-mini ✓ 4 tools · 5.8s · cloud diff: same fix; gpt-4o-mini also added a doc comment.
OpenAI-compatible chat completions against Ollama, LM Studio, MLX, or llama.cpp running on your machine.
/compare any prompt against an OpenRouter model. See if 7B is enough before paying for 400B.
mac-mini-16gb and mac-studio-32gb ship with model and context defaults that just work.
Read, write, edit, grep, glob, list-dir, shell, apply-patch — each with a per-tool approval gate and diff preview.
An inline JSON detector catches tool calls even when small models forget the prescribed format.
Populates the prompt-eval cache before your first message so the first reply doesn't feel cold.
Auto-selects tool schemas to fit the context budget and shows you exactly where the tokens went.
Tokens stream as they arrive, with grouped tool-call display so the transcript stays readable.
Append-only JSONL logs. List, resume, or export any past conversation from the prompt.
| Backend | Default URL | Best for |
|---|---|---|
| Ollama | localhost:11434/v1 | Easiest setup; mature tool-call templates. |
| LM Studio | localhost:1234/v1 | GUI model browser; explicit load and unload. |
| MLX | localhost:8080/v1 | Fastest inference on Apple Silicon. |
| llama.cpp | localhost:8080/v1 | Direct GGUF serving for full control. |
| OpenRouter | openrouter.ai/api/v1 | Cloud A/B comparison and frontier models. |
rustup)OPENROUTER_API_KEY if you want /compareSlash commands