Accéder au contenu principal

Servir une Méthode Personnalisée en tant qu'API

La méthode api de champollion vous permet de pointer n'importe quelle paire de traduction vers un point de terminaison HTTP externe. C'est ainsi que vous intégrez des pipelines trop complexes pour une simple boucle de requête-réponse LLM — analyseurs morphologiques, transducteurs à états finis (FST), chaînes LLM multi-étapes, ou toute méthode de recherche personnalisée que vous avez développée.

Pourquoi un Service API ?

Certains pipelines de traduction ne peuvent pas s'exécuter dans un simple cycle requête-réponse :

Étape du pipelineExemple
Décomposition morphologiqueDiviser les mots polysynthétiques en morphèmes avant la traduction
Validation FSTRejeter les résultats qui violent les règles phonologiques ou morphologiques
Chaînes LLM multi-étapesGénérer → vérifier → corriger des cycles avec différents modèles
Recherche dans un dictionnaireRéférencer un dictionnaire bilingue curé au milieu du pipeline
Boucle humaineMettre en file d'attente les traductions incertaines pour examen par un expert

La méthode api traite votre pipeline comme une boîte noire — champollion envoie des chaînes sources, votre service retourne des traductions. Ce qui se passe à l'intérieur dépend entièrement de vous.

Architecture

Configuration de Votre Service

Votre service API doit implémenter un seul point de terminaison qui accepte et retourne du JSON :

Format de la Requête

champollion envoie ce corps JSON exact (voir api.js) :

POST /translate
Content-Type: application/json
Authorization: Bearer <CHAMPOLLION_API_KEY>

{
"source_locale": "en",
"target_locale": "crk",
"method": "crk-coached-v1",
"keys": {
"greeting": "Hello, welcome to our app",
"farewell": "Goodbye and thanks"
}
}
ChampTypeDescription
source_localestringCode de langue source BCP 47
target_localestringCode de langue cible BCP 47
methodstringNom du plugin ou "default"
keysobjectCarte de clé → chaîne source à traduire

Format de la Réponse

Votre service doit retourner :


### Response Format

Your service must return a `translations` object. An optional `meta` object can include cost and diagnostic info:

```json
{
"translations": {
"greeting": "tânisi, pê-kîwêw ôta",
"farewell": "ekosi mâka, kinanâskomitin"
},
"meta": {
"model": "my-custom-pipeline/v1",
"cost_usd": 0.0042,
"method": "decompose-translate-validate"
}
}
FieldTypeRequiredDescription
translationsobjectMap of key → translated string
metaobjectOptional metadata
meta.cost_usdnumberIf present, displayed in champollion's output
errorsobjectFor partial success (HTTP 207): map of key → { message }

Minimal Express Server


### Exemple d'Implémentation

Voici un service Express.js qui implémente le contrat champollion :

FieldTypeRequiredDescription
translationsobjectMap of key → translated string
metaobjectOptional metadata
meta.cost_usdnumberIf present, displayed in champollion's output
errorsobjectFor partial success (HTTP 207): map of key → { message }

Minimal Express Server

import express from 'express';

const app = express();
app.use(express.json());

/**
* champollion API contract:
*
* Request: { source_locale, target_locale, method, keys: { "key": "source" } }
* Response: { translations: { "key": "translated" }, meta: { ... } }
*/
app.post('/translate', async (req, res) => {
const { source_locale, target_locale, method, keys } = req.body;

const translations = {};

for (const [key, source] of Object.entries(keys)) {
// --- Your pipeline goes here ---
// Step 1: Morphological decomposition
const morphemes = await decompose(source, source_locale);

// Step 2: LLM translation with context
const draft = await llmTranslate(morphemes, target_locale);

// Step 3: FST validation
const validated = await fstValidate(draft, target_locale);

// Step 4: Post-processing (orthography normalization, etc.)
translations[key] = await postProcess(validated);
}

res.json({
translations,
meta: {
model: 'my-custom-pipeline/v1',
method: 'decompose-translate-validate',
},
});
});

app.listen(3001, () => {
console.log('Translation API running on http://localhost:3001');
});

Configuring champollion

Point a translation pair at your running service in champollion.config.json:


### Configuration de champollion

Pointez champollion vers votre service dans votre fichier de configuration :

Configuring champollion

Point a translation pair at your running service in champollion.config.json:

{
"inputLocale": "en",
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "http://localhost:3001/translate",
"register": "Formal Plains Cree. Use SRO orthography."
}
}
}

Then run sync as usual:


Puis lancez la synchronisation :

Then run sync as usual:

npx champollion sync

champollion will POST your source strings to the endpoint and write the returned translations to crk.json.

Case Study: Plains Cree Pipeline

:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::

The arena project demonstrates this pattern. Its Plains Cree pipeline uses:

  1. Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
  2. LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
  3. FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
  4. Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage

The entire pipeline runs as a single HTTP endpoint that champollion calls via the api method.

Running Evaluations

After translating, you can evaluate output quality using the harness directly:


## Évaluation de Votre Méthode

Utilisez le harnais d'évaluation Arena pour comparer votre pipeline à d'autres méthodes :

champollion will POST your source strings to the endpoint and write the returned translations to crk.json.

Case Study: Plains Cree Pipeline

:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::

The arena project demonstrates this pattern. Its Plains Cree pipeline uses:

  1. Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
  2. LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
  3. FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
  4. Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage

The entire pipeline runs as a single HTTP endpoint that champollion calls via the api method.

Running Evaluations

After translating, you can evaluate output quality using the harness directly:

# Clone the harness
git clone https://github.com/gamedaysuits/arena.git
cd arena
pip install -e .

# Run the evaluation against your method's output
mt-eval run --corpus data/edtekla-dev-v1.json --submit

This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.

Authentication

If your API requires authentication, set the apiKey field or use an environment variable:


## Authentification et Secrets

Pour les services en production, utilisez des variables d'environnement :

This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.

Authentication

If your API requires authentication, set the apiKey field or use an environment variable:

{
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "https://my-mt-service.example.com/translate",
"apiKey": "${CRK_API_KEY}"
}
}
}

Data Sovereignty & OCAP Principles

The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:

  • Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
  • Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
  • Access policies — the community decides who can call the endpoint and under what terms.

This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.

conseil

Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.

Cost Estimation

The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:


## Métadonnées de Coût

Si votre service facture par requête, retournez les informations de coût :

Data Sovereignty & OCAP Principles

The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:

  • Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
  • Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
  • Access policies — the community decides who can call the endpoint and under what terms.

This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.

conseil

Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.

Cost Estimation

The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:

{
"translations": { "...": "..." },
"metadata": {
"cost": {
"estimatedCost": 0.0042,
"currency": "USD",
"source": "my-service-pricing"
}
}
}

Data Sovereignty & OCAP Principles

The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:

  • Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
  • Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
  • Access policies — the community decides who can call the endpoint and under what terms.

This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.

conseil

Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.

Cost Estimation

The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:


## Bonnes Pratiques

1. **Retourner des chaînes vides en cas d'échec** — Ne retournez pas la chaîne source comme « traduction ». Retournez `""` et la porte de qualité de champollion le détectera. La clé sera ignorée et réessayée lors de la prochaine synchronisation.
2. **Inclure des scores de confiance** — Si votre pipeline peut estimer la qualité, retournez-la dans les métadonnées. Cela aide à l'audit de qualité.
3. **Implémenter des vérifications de santé** — Ajoutez un point de terminaison `GET /health` pour que champollion puisse vérifier la connectivité avant de démarrer une grande synchronisation.
4. **Limiter le débit avec élégance** — Si votre pipeline a des limites de débit, retournez des codes de statut `429`. Le système de traitement par lots de champollion se retirera.
5. **Tout enregistrer** — Les pipelines multi-étapes peuvent échouer silencieusement. Enregistrez l'entrée/sortie de chaque étape pour le débogage.

## Licences

Le modèle de méthode `api` est entièrement ouvert — il n'y a aucune restriction de licence sur l'encapsulation de votre propre pipeline de traduction en tant que service HTTP. Le `arena` est disponible sous licence MIT pour les implémentations de référence.

## Voir Aussi

- [Méthodes de Traduction](/docs/guides/translation-methods) — aperçu de chaque méthode intégrée (`openai`, `google`, `api`, etc.)
- [Spécification des Plugins](/docs/reference/plugin-spec) — schéma complet pour `champollion.config.json` incluant les champs de méthode `api`
- [Soutenir une Langue Peu Dotée en Ressources](https://mtevalarena.org/docs/community/low-resource-languages) — guide de bout en bout pour les langues sous-dotées, incluant les principes OCAP
- [Architecture](/docs/concepts/architecture) — comment fonctionnent la boucle de synchronisation, le traitement par lots et la distribution des méthodes de champollion
- [Évaluation MT](https://mtevalarena.org/docs/leaderboard/rules) — méthodologie d'évaluation, métriques et processus de soumission au classement
- [Classement des Méthodes](/leaderboard) — classements de qualité en direct selon les méthodes et les paires de langues