v0.1 · MIT licensed · Apple Silicon

A terminal agent for small models running on your Mac.

Small Harness is a TUI harness for open-weight LLMs. One interface across Ollama, LM Studio, MLX, llama.cpp, and OpenRouter — with real filesystem and shell tools, sensible approval gates, and parsing built for small models.

~/proj — small-harness# profile: mac-mini-16gb · backend: ollama · model: qwen2.5-coder:7b
small-harness › read src/main.rs and tell me what /compare does
› tool · read_file  src/main.rs        approved
› tool · grep       "/compare" src/      approved

/compare runs the same prompt against your local model and a chosen
OpenRouter model side-by-side, then shows tokens/sec, latency, and a diff
of the two responses. Useful for checking whether a 7B model is good
enough before reaching for a frontier one.

small-harness › /compare openai/gpt-4o-mini

Built for small models, not against them.

Features
01

Local-first

OpenAI-compatible chat completions against Ollama, LM Studio, MLX, or llama.cpp running on your machine.

02

Cloud A/B in one keystroke

/compare any prompt against an OpenRouter model. See if 7B is enough before paying for 400B.

03

Hardware profiles

mac-mini-16gb and mac-studio-32gb ship with model and context defaults that just work.

04

Real tools, real approvals

Read, write, edit, grep, glob, list-dir, shell, apply-patch — each with a per-tool approval gate and diff preview.

05

Robust tool-call parsing

An inline JSON detector catches tool calls even when small models forget the prescribed format.

06

Pre-warmed startup

Populates the prompt-eval cache before your first message so the first reply doesn't feel cold.

07

Efficiency mode

Auto-selects tool schemas to fit the context budget and shows you exactly where the tokens went.

08

Streaming output

Tokens stream as they arrive, with grouped tool-call display so the transcript stays readable.

09

Sessions you can resume

Append-only JSONL logs. List, resume, or export any past conversation from the prompt.

Five backends, one interface.

Backends
Backend Default URL Best for
Ollamalocalhost:11434/v1Easiest setup; mature tool-call templates.
LM Studiolocalhost:1234/v1GUI model browser; explicit load and unload.
MLXlocalhost:8080/v1Fastest inference on Apple Silicon.
llama.cpplocalhost:8080/v1Direct GGUF serving for full control.
OpenRouteropenrouter.ai/api/v1Cloud A/B comparison and frontier models.

Install in a minute.

Quick start

Requirements

  1. Rust 1.75 or newer (rustup)
  2. A local backend — Ollama is the gentlest start
  3. An OPENROUTER_API_KEY if you want /compare
# clone, configure, run $ git clone https://github.com/GetSmallAI/SmallHarness.git $ cd SmallHarness $ cp .env.example .env $ cargo run --release

Slash commands

/backend /profile /model /tools /compare /session /sessions /resume /export /doctor /bench /eval /new /help