将自定义方法作为 API 提供
champollion 的 api method 让你可以将任何翻译对指向外部 HTTP 端点。这是集成过于复杂而无法在单个 LLM 提示中运行的管道的方式 — 形态分析器、有限状态转换器 (FST)、多步 LLM 链,或任何你构建的自定义研究方法。
为什么需要 API 服务?
某些翻译管道无法在简单的请求-响应循环中运行:
| 管道步骤 | 示例 |
|---|---|
| 形态分解 | 在翻译前将多综合词分解为语素 |
| FST 验证 | 拒绝违反音韵或形态规则的输出 |
| 多步 LLM 链 | 使用不同模型进行生成 → 验证 → 纠正循环 |
| 字典查询 | 在管道中期交叉引用精选双语词典 |
| 人工参与 | 将不确定的翻译排队供专家审查 |
api method 将你的管道视为黑盒 — champollion 发送源字符串,你的服务返回翻译。内部发生的一切完全由你决定。
架构
设置你的服务
你的 API 服务必须实现一个接受并返回 JSON 的单一端点:
请求格式
champollion 发送这个确切的 JSON 体(参见 api.js):
POST /translate
Content-Type: application/json
Authorization: Bearer <CHAMPOLLION_API_KEY>
{
"source_locale": "en",
"target_locale": "crk",
"method": "crk-coached-v1",
"keys": {
"greeting": "Hello, welcome to our app",
"farewell": "Goodbye and thanks"
}
}
| 字段 | 类型 | 描述 |
|---|---|---|
source_locale | string | BCP 47 源语言代码 |
target_locale | string | BCP 47 目标语言代码 |
method | string | 插件名称或 "default" |
keys | object | 键 → 源字符串映射,待翻译 |
响应格式
你的服务必须返回:
### Response Format
Your service must return a `translations` object. An optional `meta` object can include cost and diagnostic info:
```json
{
"translations": {
"greeting": "tânisi, pê-kîwêw ôta",
"farewell": "ekosi mâka, kinanâskomitin"
},
"meta": {
"model": "my-custom-pipeline/v1",
"cost_usd": 0.0042,
"method": "decompose-translate-validate"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
translations | object | ✅ | Map of key → translated string |
meta | object | — | Optional metadata |
meta.cost_usd | number | — | If present, displayed in champollion's output |
errors | object | — | For partial success (HTTP 207): map of key → { message } |
Minimal Express Server
### 示例实现
| Field | Type | Required | Description |
|---|---|---|---|
translations | object | ✅ | Map of key → translated string |
meta | object | — | Optional metadata |
meta.cost_usd | number | — | If present, displayed in champollion's output |
errors | object | — | For partial success (HTTP 207): map of key → { message } |
Minimal Express Server
import express from 'express';
const app = express();
app.use(express.json());
/**
* champollion API 契约:
*
* 请求: { source_locale, target_locale, method, keys: { "key": "source" } }
* 响应: { translations: { "key": "translated" }, meta: { ... } }
*/
app.post('/translate', async (req, res) => {
const { source_locale, target_locale, method, keys } = req.body;
const translations = {};
for (const [key, source] of Object.entries(keys)) {
// --- 你的管道在这里 ---
// 步骤 1:形态分解
const morphemes = await decompose(source, source_locale);
// 步骤 2:带上下文的 LLM 翻译
const draft = await llmTranslate(morphemes, target_locale);
// 步骤 3:FST 验证
const validated = await fstValidate(draft, target_locale);
// 步骤 4:后处理(正字法规范化等)
translations[key] = await postProcess(validated);
}
res.json({
translations,
meta: {
model: 'my-custom-pipeline/v1',
method: 'decompose-translate-validate',
},
});
});
app.listen(3001, () => {
console.log('Translation API running on http://localhost:3001');
});
Configuring champollion
Point a translation pair at your running service in champollion.config.json:
### 配置 champollion
Configuring champollion
Point a translation pair at your running service in champollion.config.json:
{
"inputLocale": "en",
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "http://localhost:3001/translate",
"register": "Formal Plains Cree. Use SRO orthography."
}
}
}
Then run sync as usual:
然后运行同步:
Then run sync as usual:
npx champollion sync
champollion will POST your source strings to the endpoint and write the returned translations to crk.json.
Case Study: Plains Cree Pipeline
:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::
The arena project demonstrates this pattern. Its Plains Cree pipeline uses:
- Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
- LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
- FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
- Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage
The entire pipeline runs as a single HTTP endpoint that champollion calls via the api method.
Running Evaluations
After translating, you can evaluate output quality using the harness directly:
## 评估你的方法
使用 [MT Evaluation Arena](https://mtevalarena.org) 对你的输出进行基准测试:
champollion will POST your source strings to the endpoint and write the returned translations to crk.json.
Case Study: Plains Cree Pipeline
:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::
The arena project demonstrates this pattern. Its Plains Cree pipeline uses:
- Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
- LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
- FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
- Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage
The entire pipeline runs as a single HTTP endpoint that champollion calls via the api method.
Running Evaluations
After translating, you can evaluate output quality using the harness directly:
# 克隆测试框架
git clone https://github.com/gamedaysuits/arena.git
cd arena
pip install -e .
# 针对你的方法输出运行评估
mt-eval run --corpus data/edtekla-dev-v1.json --submit
This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.
Authentication
If your API requires authentication, set the apiKey field or use an environment variable:
## 生产部署
对于生产环境,使用环境变量管理敏感信息:
This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.
Authentication
If your API requires authentication, set the apiKey field or use an environment variable:
{
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "https://my-mt-service.example.com/translate",
"apiKey": "${CRK_API_KEY}"
}
}
}
Data Sovereignty & OCAP Principles
The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:
- Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
- Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
- Access policies — the community decides who can call the endpoint and under what terms.
This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.
Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.
Cost Estimation
The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:
### 成本跟踪
如果你的服务产生成本(例如 LLM API 调用),在响应元数据中报告:
Data Sovereignty & OCAP Principles
The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:
- Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
- Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
- Access policies — the community decides who can call the endpoint and under what terms.
This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.
Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.
Cost Estimation
The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:
{
"translations": { "...": "..." },
"metadata": {
"cost": {
"estimatedCost": 0.0042,
"currency": "USD",
"source": "my-service-pricing"
}
}
}
Data Sovereignty & OCAP Principles
The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:
- Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
- Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
- Access policies — the community decides who can call the endpoint and under what terms.
This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.
Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.
Cost Estimation
The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:
## 最佳实践
1. **失败时返回空字符串** — 不要将源字符串作为"翻译"返回。返回 `""`,champollion 的质量门控会捕获它。该键将被跳过并在下次同步时重试。
2. **包含置信度分数** — 如果你的管道可以估计质量,在元数据中返回它。这有助于质量审计。
3. **实现健康检查** — 添加 `GET /health` 端点,以便 champollion 在开始大型同步前验证连接。
4. **优雅地处理速率限制** — 如果你的管道有吞吐量限制,返回 `429` 状态码。champollion 的批处理系统将退避。
5. **记录所有内容** — 多步管道可能会无声地失败。记录每个步骤的输入/输出以便调试。
## 许可证
`api` method 模式完全开放 — 将你自己的翻译管道包装为 HTTP 服务没有许可限制。`arena` 在 MIT 许可证下可用作参考实现。
## 另见
- [翻译方法](/docs/guides/translation-methods) — 每个内置方法的概述(`openai`、`google`、`api` 等)
- [插件规范](/docs/reference/plugin-spec) — `champollion.config.json` 的完整架构,包括 `api` method 字段
- [支持低资源语言](https://mtevalarena.org/docs/community/low-resource-languages) — 低资源语言的端到端指南,包括 OCAP 原则
- [架构](/docs/concepts/architecture) — champollion 的同步循环、批处理和方法分派如何工作
- [机器翻译评估](https://mtevalarena.org/docs/leaderboard/rules) — 评估方法、指标和排行榜提交流程
- [方法排行榜](/leaderboard) — 跨方法和语言对的实时质量排名