Specifications
What constitutes a valid run: datasets, fingerprints, run cards.
Composite score construction, metric definitions, quality tiers.
Bootstrap confidence intervals and paired comparison methodology.
How evaluation corpora are built, versioned, and contamination-checked.
Corpora
The dataset registry currently tracks 48 development corpora across low-resource language pairs, each with a license, provenance notes, and a do-not-train flag where the source requires it. Held-out test sets stay sealed; dev sets are open for iteration. Contamination findings are published, not buried — see the corpus design spec for the audit trail.
Citation & licensing
The language-card layer draws on 332 registered upstream sources — Glottolog, WALS, Grambank, PHOIBLE, Lexibank, and friends — each tracked with its license and attribution requirements. Cards record per-field provenance (_fieldSources), so any fact can be traced, challenged, and corrected.
Get in touch
Collaboration, corpus partnerships, corrections, or skepticism — all welcome. Open an issue on GitHub or start with the corpus partnership spec.