Contribute Compute | champollion

One command

Paste this into a terminal. It explains itself, asks before doing anything, installs the harness if you don’t have it, helps you add an OpenRouter key if you don’t have one, then runs the highest-value open benchmarks up to your budget — showing you the exact runs and estimated cost before a single token is spent. Change the budget and the command updates.

Budget: $

curl -fsSL champollion.dev/give | bash

Your API key stays on your machine — the harness talks straight to OpenRouter, we never see it, and you shouldn’t share it with anyone (including us). Running needs no account with us; publishing your results asks for one OAuth sign-in so the run card carries your name. Nothing system-level is touched (no sudo), and pipx uninstall mt-eval-harness removes it completely. The script is plain bash you can read first: champollion.dev/give. New to terminals entirely? The step-by-step walkthrough is coming to this page — every command above is also explained in the contributor guide.

The fast path: hand it to your agent

If you use Claude Code or another coding agent, this is a paste-one-prompt contribution. The agent installs the harness, picks a queue item, runs it with your key, and publishes the report (you’ll approve an OAuth sign-in for attribution).

Paste into Claude Code / your agent

Install the Champollion mt-eval harness (curl -fsSL champollion.dev/harness | bash).
Fetch https://champollion.dev/queue.json and show me the top 3 open items.
Using my OpenRouter key (OPENROUTER_API_KEY), execute the run_command of the
item I pick, then run `mt-eval publish` on the generated report JSON and
show me the published run card.

Prefer to drive it yourself? Two commands replace the agent: curl -fsSL champollion.dev/harness | bash to install, curl -fsSL champollion.dev/queue | bash to see the queue with ready-to-paste run commands. Both are plain bash you can read first (the installer, the queue viewer); the queue viewer only displays — it never spends your tokens.

The queue, right now

Prioritized open (corpus, model, condition) combinations — ranked by expected chain value: how much each run strengthens the whole language mesh per estimated dollar (the formula is public and every rank is re-derivable by hand). Two people running the same item is harmless: run-card fingerprints deduplicate identical runs, and independent replications are useful data, so there’s no sign-up and no claim-locking.

Loading the queue…

The contribution ladder

Tier 1

Run a benchmark

~10 minutes · most items under $0.55, median ≈ $0.09ⓘ

Install the harness, pick any open queue item, paste its command, and publish the report. That’s a real, fingerprinted data point on a language pair nobody has measured yet. No MT background needed.

Contributing Compute guide ↗

Tier 2

Craft coached prompts

an afternoon · same per-run cost as a baseline

Write a coaching file — grammar rules, a small glossary, style notes for the target language — and pass it with --coaching-file. The harness injects it as the system prompt and records the full text in the run card, so your prompt craft is reproducible. Beating the naive baseline on a low-resource pair is a genuine finding.

Cookbook: Coached LLM Prompting ↗

Tier 3

Build a method

days to weeks · you set the budget

Implement translate(entries, config) and the harness will benchmark anything inside it: FST-gated generation, dictionary lookup, retrieval, chained models. Declared dependency classes (S/O/A1/A2) keep methods comparable and auditable.

Method interface & dependency classes ↗Cookbook: FST-Gated Pipeline ↗Cookbook: Dictionary-Augmented LLM ↗

Which API key do I need?

The harness makes its calls through OpenRouter — set OPENROUTER_API_KEY (environment variable or a local .env file) and one key reaches every model in the queue lineup: Claude, GPT, and Gemini alike. If your tokens live with Anthropic, OpenAI, or Google directly, an OpenRouter account is the bridge — the harness does not yet accept direct provider keys (the run-card schema reserves an api_provider field for when it does, but today every run is an OpenRouter run). Cost tracking, model validation, and pricing snapshots all come from the same OpenRouter metadata, so what the leaderboard reports as run cost is what your key was billed.

What your run counts as

Self-benchmarked is the trust model working

Community submissions publish at the self-benchmarked tier — plainly labeled as “submitted by the person who ran it.” That’s not a caveat; it’s the design. Every run card carries the dataset hash, model, condition, full system prompt, and cost, so anyone can re-run your exact configuration and check the result. Elevated tiers (verification) are granted by review, not by self-assertion.

Attribution is the reward

Your submitter name appears on the leaderboard row. That is the recognition on offer today — we won’t promise badges, bounties, or programs that don’t exist yet.

Duplicates can’t pollute the board

Each run card is fingerprinted (SHA-256 over dataset hash, model, condition, and system prompt). Identical re-runs deduplicate on publish; near-duplicates with different prompts are separate, comparable experiments.

Eval sets, not training data

Every queued corpus is marked do_not_train and carries its license (CC-BY family, Tatoeba-derived) in the run card. Non-commercially-licensed corpora are excluded from the open queue entirely.

Trust tiers, dataset rules, and scoring are specified on mtevalarena.org. See your result on the leaderboard after publishing.