Lumaktaw sa pangunahing nilalaman

Paglilingkod ng Custom Method bilang API

Ang api method ng champollion ay nagbibigay-daan sa inyo na ituro ang anumang translation pair sa isang external HTTP endpoint. Ganito ninyo isinasama ang mga pipeline na masyadong kumplikado para sa iisang LLM prompt — morphological analyzers, finite-state transducers (FSTs), multi-step LLM chains, o anumang custom research method na binuo ninyo.

Bakit API Service?

May ilang translation pipeline na hindi maaaring tumakbo sa loob ng simpleng prompt-response cycle:

Hakbang ng pipelineHalimbawa
Morphological decompositionHatiin ang mga polysynthetic word sa mga morpheme bago isalin
FST validationTanggihan ang mga output na lumalabag sa phonological o morphological rules
Multi-step LLM chainsMga cycle na generate → verify → correct gamit ang iba’t ibang model
Dictionary lookupI-cross-reference ang curated bilingual dictionary sa gitna ng pipeline
Human-in-the-loopI-queue ang mga hindi tiyak na salin para sa expert review

Itinuturing ng api method ang inyong pipeline bilang black box — nagpapadala ang champollion ng mga source string, at ibinabalik ng inyong service ang mga salin. Ganap na nasa inyo ang nangyayari sa loob.

Arkitektura

Pag-set Up ng Inyong Service

Dapat magpatupad ang inyong API service ng iisang endpoint na tumatanggap at nagbabalik ng JSON:

Format ng Request

Ipinapadala ng champollion ang eksaktong JSON body na ito (tingnan ang api.js):

POST /translate
Content-Type: application/json
Authorization: Bearer <CHAMPOLLION_API_KEY>

{
"source_locale": "en",
"target_locale": "crk",
"method": "crk-coached-v1",
"keys": {
"greeting": "Hello, welcome to our app",
"farewell": "Goodbye and thanks"
}
}
FieldTypePaglalarawan
source_localestringBCP 47 source language code
target_localestringBCP 47 target language code
methodstringPangalan ng plugin o "default"
keysobjectMap ng key → source string na isasalin

### Response Format

Your service must return a `translations` object. An optional `meta` object can include cost and diagnostic info:

```json
{
"translations": {
"greeting": "tânisi, pê-kîwêw ôta",
"farewell": "ekosi mâka, kinanâskomitin"
},
"meta": {
"model": "my-custom-pipeline/v1",
"cost_usd": 0.0042,
"method": "decompose-translate-validate"
}
}
FieldTypeRequiredDescription
translationsobjectMap of key → translated string
metaobjectOptional metadata
meta.cost_usdnumberIf present, displayed in champollion's output
errorsobjectFor partial success (HTTP 207): map of key → { message }

Minimal Express Server

import express from 'express';

const app = express();
app.use(express.json());

/**
* champollion API contract:
*
* Request: { source_locale, target_locale, method, keys: { "key": "source" } }
* Response: { translations: { "key": "translated" }, meta: { ... } }
*/
app.post('/translate', async (req, res) => {
const { source_locale, target_locale, method, keys } = req.body;

const translations = {};

for (const [key, source] of Object.entries(keys)) {
// --- Your pipeline goes here ---
// Step 1: Morphological decomposition
const morphemes = await decompose(source, source_locale);

// Step 2: LLM translation with context
const draft = await llmTranslate(morphemes, target_locale);

// Step 3: FST validation
const validated = await fstValidate(draft, target_locale);

// Step 4: Post-processing (orthography normalization, etc.)
translations[key] = await postProcess(validated);
}

res.json({
translations,
meta: {
model: 'my-custom-pipeline/v1',
method: 'decompose-translate-validate',
},
});
});

app.listen(3001, () => {
console.log('Translation API running on http://localhost:3001');
});

Configuring champollion

Point a translation pair at your running service in champollion.config.json:

{
"inputLocale": "en",
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "http://localhost:3001/translate",
"register": "Formal Plains Cree. Use SRO orthography."
}
}
}

Then run sync as usual:

npx champollion sync

champollion will POST your source strings to the endpoint and write the returned translations to crk.json.

Case Study: Plains Cree Pipeline

:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::

The arena project demonstrates this pattern. Its Plains Cree pipeline uses:

  1. Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
  2. LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
  3. FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
  4. Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage

The entire pipeline runs as a single HTTP endpoint that champollion calls via the api method.

Running Evaluations

After translating, you can evaluate output quality using the harness directly:

# Clone the harness
git clone https://github.com/gamedaysuits/arena.git
cd arena
pip install -e .

# Run the evaluation against your method's output
mt-eval run --corpus data/edtekla-dev-v1.json --submit

This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.

Authentication

If your API requires authentication, set the apiKey field or use an environment variable:

{
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "https://my-mt-service.example.com/translate",
"apiKey": "${CRK_API_KEY}"
}
}
}

Data Sovereignty & OCAP Principles

The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:

  • Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
  • Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
  • Access policies — the community decides who can call the endpoint and under what terms.

This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.

tip

Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.

Cost Estimation

The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:

{
"translations": { "...": "..." },
"metadata": {
"cost": {
"estimatedCost": 0.0042,
"currency": "USD",
"source": "my-service-pricing"
}
}
}

Mga Best Practice

  1. Magbalik ng empty strings para sa mga failure — Huwag ibalik ang source string bilang “translation.” Ibalik ang "" at mahuhuli ito ng quality gate ng champollion. Lalaktawan ang key at susubukan muli sa susunod na sync.
  2. Isama ang confidence scores — Kung kayang tantiyahin ng inyong pipeline ang quality, ibalik ito sa metadata. Nakakatulong ito sa quality auditing.
  3. Magpatupad ng health checks — Magdagdag ng GET /health endpoint upang ma-verify ng champollion ang connectivity bago magsimula ng malaking sync.
  4. Mag-rate limit nang maayos — Kung may throughput limits ang inyong pipeline, magbalik ng 429 status codes. Magba-back off ang batch system ng champollion.
  5. I-log ang lahat — Maaaring mabigo nang tahimik ang mga multi-step pipeline. I-log ang input/output ng bawat hakbang para sa debugging.

Licensing

Ganap na bukas ang pattern ng api method — walang licensing restrictions sa pag-wrap ng sarili ninyong translation pipeline bilang HTTP service. Available ang arena sa ilalim ng MIT license para sa mga reference implementation.

Tingnan Din