v0.1 · MIT licensed · Apple Silicon

A terminal agent for small models running on your Mac.

Small Harness is a TUI harness for open-weight LLMs. One interface across Ollama, LM Studio, MLX, llama.cpp, and OpenRouter — with real filesystem and shell tools, sensible approval gates, and parsing built for small models.

Get started → View on GitHub

~/proj — small-harness# profile: mac-mini-16gb · backend: ollama · model: qwen2.5-coder:7b
small-harness › read src/main.rs and tell me what /compare does
› tool · read_file  src/main.rs        approved
› tool · grep       "/compare" src/      approved

/compare runs the same prompt against your local model and a chosen
OpenRouter model side-by-side, then shows tokens/sec, latency, and a diff
of the two responses. Useful for checking whether a 7B model is good
enough before reaching for a frontier one.

small-harness › /compare openai/gpt-4o-mini

Built for small models, not against them.

Features

Local-first

OpenAI-compatible chat completions against Ollama, LM Studio, MLX, or llama.cpp running on your machine.

Cloud A/B in one keystroke

/compare any prompt against an OpenRouter model. See if 7B is enough before paying for 400B.

Hardware profiles

mac-mini-16gb and mac-studio-32gb ship with model and context defaults that just work.

Real tools, real approvals

Read, write, edit, grep, glob, list-dir, shell, apply-patch — each with a per-tool approval gate and diff preview.

Robust tool-call parsing

An inline JSON detector catches tool calls even when small models forget the prescribed format.

Pre-warmed startup

Populates the prompt-eval cache before your first message so the first reply doesn't feel cold.

Efficiency mode

Auto-selects tool schemas to fit the context budget and shows you exactly where the tokens went.

Streaming output

Tokens stream as they arrive, with grouped tool-call display so the transcript stays readable.

Sessions you can resume

Append-only JSONL logs. List, resume, or export any past conversation from the prompt.

Five backends, one interface.

Backends

Backend	Default URL	Best for
Ollama	`localhost:11434/v1`	Easiest setup; mature tool-call templates.
LM Studio	`localhost:1234/v1`	GUI model browser; explicit load and unload.
MLX	`localhost:8080/v1`	Fastest inference on Apple Silicon.
llama.cpp	`localhost:8080/v1`	Direct GGUF serving for full control.
OpenRouter	`openrouter.ai/api/v1`	Cloud A/B comparison and frontier models.

Install in a minute.

Quick start

Requirements

Rust 1.75 or newer (rustup)
A local backend — Ollama is the gentlest start
An OPENROUTER_API_KEY if you want /compare

# clone, configure, run $ git clone https://github.com/GetSmallAI/SmallHarness.git $ cd SmallHarness $ cp .env.example .env $ cargo run --release

Slash commands

/backend /profile /model /tools /compare /session /sessions /resume /export /doctor /bench /eval /new /help