Skip to main content

Supported Languages

champollion ships with Language Cards — structured configuration files for 50 languages. Each card contains register presets, formality system metadata, method support flags, typography rules, and script information. Any language your LLM knows can be added with a single config line — these are the ones with curated, production-ready registers.


Translation Methods

Each language can use one or more of these translation methods:

IconMethodHow It WorksCost
🟢Google TranslateNeural MT baseline. 130+ languages. Key-value strings only — cannot safely translate Markdown content.~$20/1M chars
🔵LLM (OpenRouter)Any language the model knows. Register-steered prompts. Handles key-value + Markdown content.Varies by model
🟣LLM-CoachedLLM + grammar dictionaries + coaching data injected into prompts. Best for morphologically complex languages.Varies by model
🟠API (Plugin)Community-hosted translation pipelines served over HTTP. OCAP-compatible.Varies by provider

Set GOOGLE_TRANSLATE_API_KEY for Google Translate, or OPENROUTER_API_KEY for LLM methods. See Translation Methods for full details.


Priority Languages

These are the most commonly requested locales for web and mobile applications, listed in champollion's recommended accessibility-first order.

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇸🇦ArabicarRTL. Modern Standard Arabic (فصحى).
🇵🇭Filipino (Taglish)tl / filUse fil in Docusaurus configs. champollion resolves both.
🇫🇷FrenchfrVous-form. Gender-inclusive (Connecté·e).
🇪🇸SpanishesNeutral Latin American.
🇩🇪GermandeSie-form. Gender-inclusive (Benutzer:innen).
🇯🇵Japanesejaです/ます for body text, する for UI labels.
🇨🇳Chinese (Simplified)zh简体中文.
🇮🇹ItalianitLei-form.
🇧🇷Portuguese (BR)ptBrazilian Portuguese.
🇰🇷Koreanko해요체 polite register.

Major World Languages

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇧🇩Bengalibnশুদ্ধ ভাষা preference.
🇧🇬Bulgarianbg
🇨🇿CzechcsVykání (vy-form).
🇩🇰Danishda
🇬🇷GreekelModern Δημοτική.
🇮🇷PersianfaRTL.
🇫🇮FinnishfiNo grammatical gender.
🇮🇱HebrewheRTL.
🇮🇳Hindihiशुद्ध हिन्दी. Minimal English loanwords.
🇭🇺HungarianhuÖn-form.
🇮🇩Indonesianid
🇲🇾Malayms
🇳🇱DutchnlU-form.
🇳🇴NorwegiannbBokmål.
🇵🇱PolishplPan/Pani form.
🇵🇹Portuguese (EU)pt-PTEuropean Portuguese.
🇷🇴Romanianro
🇷🇺RussianruВы-form.
🇸🇰SlovakskVykanie (vy-form).
🇷🇸Serbiansr🔤 Latin→CyrillicDeterministic script converter.
🇸🇪Swedishsv
🇰🇪Swahilisw
🇹🇭Thaithครับ/ค่ะ politeness particles.
🇹🇷TurkishtrSiz-form.
🇺🇦UkrainianukВи-form.
🇵🇰UrduurRTL. آپ form.
🇻🇳Vietnamesevi
🇹🇼Chinese (Traditional)zh-TW繁體中文.
🇬🇪Georgiankaქართული. Kartvelian family.
🇳🇬YorubayoÈdè Yorùbá. Tonal (3 tones).

Regional Variants

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇲🇽Mexican Spanishes-MXTú-form. Warm register.
🇨🇦Canadian Frenchfr-CAQuébécois idioms.

Indigenous & Low-Resource Languages

These languages are not supported by commercial MT services. champollion provides the tooling for language communities to build their own methods under OCAP principles.

LanguageCodeGoogleLLMCoachedScriptStatus
🪶Plains Creecrk🔤 SRO→Syllabics🚧 Under development
🌄QuechuaquRunasimi. Evidential suffixes.

:::info Plains Cree is under active development The register, coaching infrastructure, script converter, and evaluation harness for Plains Cree are all functional, but the translation pipeline has not yet been released. We are working with language communities under OCAP principles to ensure quality before release. See Support a Low-Resource Language for the full story — and how you can contribute. :::

:::tip Adding more low-resource languages champollion's method plugin system is designed for this. A language community can build a custom translation method, host it under their own control, and serve it via the API method. The Method Leaderboard tracks scores for any language pair — build a method, run the harness, and claim the top score. :::


Constructed Languages

Conlangs are supported via LLM registers and optional script converters. They use the same infrastructure as real languages — the quality gate, coaching system, and script conversion pipeline work identically.

LanguageCodeGoogleLLMScriptNotes
🖖Klingontlh🔤 Romanization→pIqaDPUA font required. Marc Okrand vocabulary.
🧝Sindarin (Tolkien Elvish)x-elvish-s🔤 Latin→TengwarCSUR PUA font required.
🏴‍☠️Pirate Englishx-pirateRegister only. Nautical metaphors.
🦸Kryptonianx-kryptonian🔤 Latin→KryptonianPUA font required.
🎭Shakespearean Englishx-shakespeareRegister only. Thee/thou, -eth/-est forms.
🐸Yoda-speakx-yodaRegister only. OSV word order.

See Conlangs, Scripts & Orthography for PUA font requirements, Unicode limitations, and how to add your own.


Language Presets

The init wizard supports preset names for quick setup. You can mix presets with individual codes.

PresetExpands To
europeanfr, de, es, it, pt, nl
asianja, zh, ko
globalfr, es, de, ja, zh, ko, pt, ar
nordicda, fi, nb, sv
# Mix presets with individual codes
champollion init
# → Target languages: european, ja
# → Resolves to: fr, de, es, it, pt, nl, ja

Adding Any Language

champollion can translate to any language your LLM knows — the table above just lists languages with built-in register presets. To add an unlisted language, include its BCP-47 code in your config:

{
"languages": {
"sw": {},
"am": {
"register": "Formal Amharic. Professional register with Geʽez script."
}
}
}

The LLM will translate using its training knowledge of the language. Setting a register gives you control over tone, formality, and orthographic conventions. See Configuration for details.


Language Cards

Each built-in language has a Language Card — a unified JSON file in shared/language-cards/ containing all metadata: registers, formality, method support, typography rules, genealogical classification, linguistic challenges, and NLP resources.

Unified Card Architecture

Each card is loaded eagerly at import. There is no separate reference tier — all data lives in a single file per language. Cards are enriched from authoritative sources:

SourceData
GlottologFamily classification, ancestry chain, Glottocode
WALSGenus classification, typological features
CLDRScript, direction, plural rules, typography
ISO 15924Script codes

Key Card Fields

FieldWhat It Contains
nativeNameEndonym — the language's name for itself, in its own script (e.g., ქართული, Runasimi)
classificationGenealogical anchor: family, genus, full ancestry chain from Glottolog
contactInfluencesUniversal contact history — borrowing layers, superstrates, substrates
Formality systemT-V distinction, speech levels, keigo, particles, etc.
Register presetsNamed LLM prompt presets specific to the language's character
Method supportWhich translation APIs support this language
Gender guidanceGrammatical gender rules and inclusive writing tips
Script/directionISO 15924 script code and RTL/LTR
RulesTypography (quotes, spacing), capitalization, plural categories
glottocodeCanonical Glottolog identifier for cross-referencing
dataSourcesProvenance tracking (e.g., ["glottolog-5.3", "cldr-48"])

Scaffolding a New Language Card

Use the generator to scaffold a card from authoritative data sources (IANA, CLDR, Glottolog):

# Preview what would be generated
node scripts/generate-language-card.mjs sw --dry-run

# Generate a unified card
node scripts/generate-language-card.mjs sw

The generator auto-populates metadata (codes, script, direction, plurals, quotes, method support, classification) and marks linguistic judgment fields as TODO for human curation.

Using Preset Keys

Instead of writing full register text, you can use a preset key name:

{
"languages": {
"fr": "casual-tu",
"ko": "formal-hapsyo",
"ja": "polite"
}
}

Champollion resolves the key to the full register prompt. Run npx champollion init to see available presets for each language.

Example Presets

LanguagePresetsDefault
Frenchformal-vous, casual-tuformal-vous
Koreanpolite-haeyo, formal-hapsyo, casual-haepolite-haeyo
Japanesepolite, formal-keigo, casualpolite
Germanformal-Sie, casual-duformal-Sie
Thaineutral-professional, polite-male, polite-femaleneutral-professional
Spanishneutral-professional, formal-usted, casual-tuteoneutral-professional

See Contributing a Language Card for the full spec, including field validation and PR checklist.


See Also