语言卡规范

单一信息源。 本文档定义了每张语言卡的规范形状。每张卡必须包含此处列出的每个顶级字段，即使值为 null 或 []。缺少字段的卡不符合规范。这种统一性使自动化工具、linter、富化脚本和人工审查者能够信任卡的结构。

设计原则

统一的形状。 所有 8,000+ 张卡具有相同的顶级字段。未知值为 null，空数组为 []，空对象为 null（不是 {}）。这意味着代码永远不需要检查"这个字段存在吗？"——只需检查"它有值吗？"
溯源一切。 每项事实声明都追溯到一个命名的、版本化的、主要来源。无源声明是无法验证的声明。dataSources 字段（以及子对象中的每字段 source 注释）使出处明确。
保留分歧。 当权威机构意见不一致时（Wikidata 说 50,000 使用者，Ethnologue 说 20,000），我们存储两者并标注来源。我们不平均、不解决、不选边。用户可以理解细微差别。
Null 表示未知，不是不适用。 如果字段为 null，意味着"我们还没有找到这方面的数据"。如果字段确实不适用（例如，grammatical gender 对于手语），值应该解释这一点：{ "grammatical": false, "inclusiveGuidance": "不适用——美国手语没有语法性别。" }
仅合并。 富化脚本添加数据，永不覆盖。人工策划的值优先于自动化数据。

三层架构

层	位置	目的
语言卡	`shared/language-cards/<code>.json`	每种语言的配置：身份、分类、资源、一切
属卡	`shared/language-cards/genera/<genus>.json`	相关语言的共享运行时属性（策划的，非自动生成）
语言树	`shared/language-cards/language-tree.json`	完整的 Glottolog 层级——Lab UI 和语言发现的参考数据

继承模型

当卡设置 "extends": "family-dravidian" 时，运行时使用 _deepMerge()（在 lib/registers.js 中）将父卡合并到子卡中。这让属卡定义共享的寄存器、正式系统和性别指导，流向所有成员语言——无需在数百张单独的卡中重复数据。

合并语义

子值	行为	原因
`null`	从父继承	`null` 意味着"我不定义这个"——父的值流向下来
非 null	覆盖父	子的数据更具体——优先
嵌套对象	递归合并	子字段覆盖，父字段保留
数组	完全替换	数组不逐项合并——子数组获胜

身份字段（永不继承）

某些字段属于卡本身，必须永不从父继承：

code, extends, _migration, aliases, iso639_1, iso639_3

即使父卡定义了 aliases: ["macro-code"]，子卡也不会继承这些别名。这些字段始终是子卡自己的值（包括未设置时的 null）。

原因： 没有这条规则，每种 Cree 语言都会从宏语言父继承 aliases: ["cre"]，使每个变体都成为宏的别名。

示例：Cree 卡如何解析

┌───────────────────────┐
│  family-algic.json    │  formality: null, registers: null
│  (no registers)       │
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  genus-cree.json      │  formality: { system: "obviative-animate", ... }
│  (sourced registers)  │  registers: { formal: {...}, informal: {...} }
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  crk.json             │  code: "crk", extends: "genus-cree"
│  (Plains Cree)        │  formality: null → inherits from genus-cree
│                       │  registers: null → inherits from genus-cree
│                       │  script: "Cans"  → own value, no inheritance
│                       │  code: "crk"     → identity field, never inherited
└───────────────────────┘

在运行时，getLanguageCard("crk") 返回一个合并的对象，包含 genus-cree 的寄存器 + family-algic 的属性（如果有）+ crk 自己的身份和元数据。

属卡模板

属卡位于 shared/language-cards/genera/ 并为语言组定义共享属性。它们遵循与常规卡相同的模式，但约定不同：

{
  // Identity — genus cards use a prefixed code, NOT an ISO 639-3 code
  "code": "genus-cree",           // "genus-", "family-", or "macrolanguage-" prefix
  "name": "Cree Languages",      // Human-readable group name
  "extends": "family-algic",     // Genus cards can extend family cards (chaining)

  // Formality — shared across the group, sourced from typological databases
  "formality": {
    "system": "obviative-animate",
    "description": "Cree languages use an obviative/proximate system...",
    "default": "formal",
    "source": "WALS 37A, 38A + Wolfart 1973"
  },

  // Registers — shared presets, if the group shares a formality system
  "registers": {
    "formal": {
      "label": "Formal (Proximate)",
      "description": "...",
      "prompt": "...",
      "isDefault": true
    },
    "informal": {
      "label": "Informal",
      "description": "...",
      "prompt": "..."
    }
  },

  // Gender — shared grammatical gender behavior
  "gender": {
    "grammatical": false,       // Cree doesn't have grammatical gender
    "inclusiveGuidance": null   //   so no inclusive guidance needed
  },

  // Everything else is null — individual cards provide their own
  // classification, geography, resources, etc.
  "classification": null,
  "methodSupport": null,
  // ...
}

关键规则： 属卡必须仅包含在整个组中真正共享且来自权威参考的数据。如果正式系统在成员之间变化，它应该在单个卡上，而不是属卡上。

规范模板

每张卡必须具有这个确切的顶级形状。子对象模式在下面的字段参考中记录。

{
  // ═══════════════════════════════════════════════════════════════════════
  //  § 1. IDENTITY
  //  Who is this language? What codes identify it?
  //  Sources: ISO 639-3 registry, ISO 639-1, BCP 47/IANA.
  // ═══════════════════════════════════════════════════════════════════════

  "code":          "xxx",       // REQUIRED. ISO 639-3 code. This IS the card ID and filename.
  "name":          "English Name",  // REQUIRED. English reference name from ISO 639-3 registry.
  "nativeName":    null,        // Endonym (name in the language itself). Source: Wikidata P1705.
                                // Examples: "nêhiyawêwin / ᓀᐦᐃᔭᐍᐏᐣ", "日本語", "Esperanto".
  "alternateNames": [],         // Other names this language is known by. Source: Glottolog, Ethnologue.
                                // Not aliases (those are code-level). These are name-level variants.
                                // Example: ["Qafar af", "Afaraf", "'Afar Af"] for Afar (aar).
  "iso639_3":      "xxx",      // REQUIRED. Three-letter ISO 639-3 code. Same as `code`.
  "iso639_1":      null,        // Two-letter ISO 639-1 code (e.g., "en", "fr"). null if none.
  "bcp47":         null,        // IETF BCP 47 tag. Often same as iso639_1. Can include subtags
                                // (e.g., "iu-Cans-CA"). null if unknown.
  "aliases":       [],          // Alternative code-level identifiers that resolve to this card.
                                // Example: ["fil"] for tl (Tagalog), ["iu"] for iku (Inuktitut).
                                // Used by code resolution: user types "fil", system loads tl.json.
  "isoScope":      "I",        // REQUIRED. ISO 639-3 scope:
                                //   "I" = Individual language
                                //   "M" = Macrolanguage (e.g., Chinese, Arabic, Cree)
                                //   "S" = Special (e.g., mis, mul, zxx)
  "isoType":       "L",        // REQUIRED. ISO 639-3 type:
                                //   "L" = Living    "E" = Extinct    "A" = Ancient
                                //   "H" = Historical    "C" = Constructed
  "macrolanguage": null,        // If this language is part of a macrolanguage, the macrolanguage
                                // ISO 639-3 code (e.g., "cre" for Plains Cree, "ara" for Arabic
                                // varieties). Source: ISO 639-3 macrolanguages.tab.
  "extends":       null,        // Genus card key if shared properties are inherited from a genus
                                // card (e.g., "genus-cree", "genus-eskimo-aleut").
                                // null for most languages.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 2. CLASSIFICATION
  //  Where does this language sit in the family tree?
  //  Source: Glottolog. NEVER hand-build classifications.
  // ═══════════════════════════════════════════════════════════════════════

  "glottocode":      null,      // Glottolog identifier (e.g., "plai1258", "stan1293").
                                // null if the language is not in Glottolog.
  "classification":  null,      // Genealogical classification from Glottolog. When populated:
                                // {
                                //   "family": "Algic",              // Top-level family. null for isolates.
                                //   "familyGlottocode": "algi1248", // Glottocode of the family.
                                //   "genus": "Plains Creeic",       // WALS-style genus.
                                //   "genusGlottocode": "plai1264",  // Glottocode of the genus.
                                //   "ancestry": ["Algic", "Algonquian-Blackfoot", "Algonquian",
                                //                "Cree-Montagnais-Naskapi", "Cree", "Plains Creeic"]
                                // }
                                // For isolates: family = language name, genus = language name,
                                // ancestry = [language name].
  "isIsolate":       false,     // true if a language isolate (no known genetic relatives).
                                // Source: Glottolog CLDF.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 3. GEOGRAPHY
  //  Where is this language spoken?
  //  Sources: Glottolog (coordinates, countries), census data, Ethnologue.
  // ═══════════════════════════════════════════════════════════════════════

  "macroarea":     null,        // Glottolog macroarea. One of: "Africa", "Australia",
                                // "Eurasia", "North America", "Papunesia", "South America".
                                // null if unknown. Source: Glottolog CLDF.
  "coordinates":   null,        // Representative geographic point. When populated:
                                // { "lat": 52.1, "lng": -106.6, "source": "glottolog-5.3" }
                                // This is a representative point, not a boundary.
  "countries":     [],          // ISO 3166-1 alpha-2 country codes where this language is spoken.
                                // Example: ["CA", "US"]. Source: Glottolog.
  "regions":       [],          // Detailed regional breakdown with admin codes & speaker estimates.
                                // Each entry:
                                // {
                                //   "country": "Canada",
                                //   "countryCode": "CA",
                                //   "officialStatus": "recognized",  // official, co-official,
                                //                                    // recognized, none
                                //   "region": "Saskatchewan, Alberta, Manitoba",
                                //   "speakerEstimate": "~20,000",
                                //   "coordinates": [-106.6, 52.1],   // [lng, lat]
                                //   "admin1Codes": ["CA-SK", "CA-AB", "CA-MB"]
                                // }

  "arealContext":  null,         // Linguistic area / Sprachbund membership. DISTINCT from
                                // contactInfluences (which is language-specific contact history).
                                // This field captures zone-level typological convergence patterns
                                // — i.e., what linguistic area the language exists within and
                                // what features are common across that area.
                                // {
                                //   "zone": "Mainland Southeast Asian Sprachbund",
                                //   "arealFeatures": "Tonal convergence, classifier systems,
                                //     topic-prominence, monosyllabicity trend.",
                                //   "typicalContacts": ["Classical Chinese", "Sanskrit/Pali"],
                                //   "source": "areal-linguistics (Enfield 2005)"
                                // }
                                // NOT the same as contactInfluences. A language can exist within
                                // a convergence area without having specific contact history with
                                // any particular language in that area.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 4. WRITING SYSTEMS
  //  How is this language written?
  //  Sources: Wikidata P282, ISO 15924, manual research.
  //  Note: Some languages have NO standardized orthography. Some have
  //  competing orthographies. Some use multiple scripts routinely (e.g.,
  //  Serbian: Cyrillic + Latin; Japanese: Kanji + Hiragana + Katakana).
  //  Sign languages may use notation systems (SignWriting, HamNoSys) or
  //  none at all.
  // ═══════════════════════════════════════════════════════════════════════

  "script":        null,        // Primary ISO 15924 script code (e.g., "Latn", "Cyrl", "Cans",
                                // "Jpan"). null if no written form or unknown.
  "scriptUnicodeName": null,    // Unicode script block name derived from the script field.
                                // e.g., "Latin", "Cyrillic", "Canadian_Aboriginal", "CJK".
                                // Used by code_switching metric plugin. Auto-populated by
                                // enrich-script-unicode-names.mjs. null if script is null.
  "scripts":       [],          // All writing systems with detail. Array of:
                                // {
                                //   "code": "Cans",
                                //   "name": "Unified Canadian Aboriginal Syllabics",
                                //   "primary": true
                                // }
                                // A language with multiple scripts has multiple entries.
                                // A language with no written form has [].
  "dir":           null,        // Writing direction: "ltr" (left-to-right) or "rtl" (right-to-left).
                                // null if no written form or unknown.
  "scriptConverter": null,      // Script converter key if we have a converter for this language
                                // (e.g., "crk" for SRO↔Syllabics). null for most languages.
  "orthographicStatus": null,   // Writing system standardization status. When populated:
                                // {
                                //   "status": "standardized",
                                //       // "standardized" — official/agreed orthography exists
                                //       // "competing"    — multiple orthographies in active use
                                //       // "emerging"     — orthography under development
                                //       // "none"         — primarily oral, no standard writing
                                //   "notes": "Uses SIL-developed Latin orthography since 1960s.",
                                //   "source": "ethnologue" // or "manual-curation"
                                // }
                                // Crucial for LRLs where orthographic variation directly impacts
                                // MT training data quality and evaluation consistency.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5. DEMOGRAPHICS & VITALITY
  //  How many people speak this language? Is it endangered?
  //  Sources: Census, Ethnologue, UNESCO Atlas, Wikidata, Glottolog AES.
  //
  //  CRITICAL: Store ALL estimates separately with source attribution.
  //  Never average or "resolve" conflicting data. Speaker counts are
  //  politically contested for many languages. Present the evidence,
  //  let the reader assess.
  // ═══════════════════════════════════════════════════════════════════════

  "speakerEstimates": [],       // Array of speaker count estimates from different authorities.
                                // Each entry:
                                // {
                                //   "source": "wikidata",              // or "ethnologue-28",
                                //                                      // "census-ph-2020", etc.
                                //   "count": 20000,                    // Point estimate. null if range-only.
                                //   "date": "2026-06-07",              // When this data was retrieved.
                                //   "countRange": { "min": 15000, "max": 25000 },  // Optional range.
                                //   "note": "Wikidata has 2 estimates: 15,000 and 25,000"
                                // }
                                // Empty array means we have not yet found speaker count data.

  "vitality":      null,        // Endangerment / vitality assessment. When populated:
                                // {
                                //   "unescoStatus": "severely-endangered",
                                //       // Enum: "safe", "vulnerable", "definitely-endangered",
                                //       //       "severely-endangered", "critically-endangered",
                                //       //       "extinct"
                                //   "aesStatus": "shifting",
                                //       // Glottolog AES label (free text from AES data).
                                //   "egids": "6b",
                                //       // Ethnologue Expanded Graded Intergenerational Disruption
                                //       // Scale. Levels: 0 (international) to 10 (extinct).
                                //   "trend": "declining",
                                //       // Qualitative trend: "stable", "growing", "declining",
                                //       //                     "shifting", "moribund", "awakening"
                                //   "source": "glottolog-aes-5.3",
                                //   "notes": "Intergenerational transmission breaking down."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5.5. DOCUMENTATION & DIGITAL PRESENCE
  //  How well-documented is this language? What digital footprint does it
  //  have? These fields answer the practical question: "What can I
  //  actually DO with this language?"
  //  Sources: Glottolog (references), Wikipedia, Common Voice, Tatoeba.
  // ═══════════════════════════════════════════════════════════════════════

  "documentationDepth": null,    // How well-documented is this language in the literature?
                                 // {
                                 //   "referenceCount": 42,
                                 //       // Number of published references in Glottolog.
                                 //   "med": "grammar",
                                 //       // Most Extensive Description type. One of:
                                 //       // "long_grammar", "grammar", "grammar_sketch",
                                 //       // "dictionary", "phonology", "text", "wordlist",
                                 //       // "comparative", "minimal", "unknown"
                                 //   "source": "glottolog-5.3"
                                 // }

  "digitalPresence":  null,      // Digital footprint across web platforms. When populated:
                                 // {
                                 //   "wikipedia": {
                                 //     "edition": true,      // Has its own Wikipedia edition?
                                 //     "articleCount": 75000, // Number of articles.
                                 //     "editionCode": "crk",  // Wikipedia subdomain code.
                                 //     "source": "wikimedia-api-2026"
                                 //   },
                                 //   "commonVoice": {
                                 //     "validatedHours": 12.5,
                                 //     "totalHours": 25.0,
                                 //     "speakers": 45,
                                 //     "sentences": 1200,
                                 //     "source": "common-voice-20.0"
                                 //   },
                                 //   "tatoeba": {
                                 //     "sentenceCount": 342,
                                 //     "source": "tatoeba-2026"
                                 //   }
                                 // }

  "dialectCount":     null,      // Number of recognized dialects in Glottolog.
                                 // Derived from child_dialect_count in languoid.csv.
                                 // Simple integer. null if 0 or unknown.
                                 // Source: glottolog-5.3.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 6. FORMALITY, REGISTERS & GENDER
  //  How does politeness work in this language? What translation registers
  //  do we offer? How should gender be handled?
  //
  //  This section drives Champollion's register-preset system — the
  //  mechanism by which users select formal/informal/professional tone.
  //  These fields require genuine linguistic research, not automation.
  // ═══════════════════════════════════════════════════════════════════════

  "formality":     null,        // Formality system description. When populated:
                                // {
                                //   "system": "T-V",
                                //       // One of: "T-V", "speech-levels", "keigo", "particles",
                                //       //         "register-levels", "register-and-code-switching",
                                //       //         "code-switching", "none"
                                //   "description": "French uses a vous/tu distinction...",
                                //   "default": "formal-vous"   // Key into the `registers` object.
                                // }

  "registers":     null,        // Translation register presets. When populated, keyed by preset ID:
                                // {
                                //   "formal-vous": {
                                //     "label": "Formal (vouvoiement)",
                                //     "description": "One sentence: when to use this preset.",
                                //     "prompt": "The actual LLM system prompt instruction that
                                //               steers translation tone. Must name specific
                                //               linguistic features (pronouns, verb forms, particles).",
                                //     "deeplFormality": "prefer_more"
                                //       // Only if methodSupport.deepl.formality is true.
                                //       // One of: "prefer_more", "prefer_less", "default".
                                //   }
                                // }

  "gender":        null,        // Grammatical gender and inclusive guidance. When populated:
                                // {
                                //   "grammatical": true,         // Does the language have gram. gender?
                                //   "inclusiveGuidance": "Use gender-neutral forms when possible.
                                //                        Prefer 'iel' (neologism) or rephrase to
                                //                        avoid gendered agreement."
                                // }
                                // For languages without grammatical gender (Turkish, Finnish):
                                // { "grammatical": false, "inclusiveGuidance": null }

  "codeSwitching":  null,       // Code-switching behavior (for languages where mixing with another
                                // language is the norm, not an error). When populated:
                                // {
                                //   "contactLanguage": "Spanish",
                                //   "contactIso639_3": "spa",
                                //   "mixedVarietyName": "Jopará",   // null if no named mixed variety
                                //   "prevalence": "dominant",       // "rare", "common", "dominant"
                                //   "morphologicalIntegration": true,
                                //   "pipelineStrategy": "hybrid-fst",
                                //   "notes": "Jopará IS the everyday language of most Paraguayans..."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 7. LINGUISTIC PROFILE
  //  What makes this language what it is? What are the specific challenges
  //  for machine translation? What rules govern its typography?
  //  What languages have shaped it through contact?
  //
  //  These fields require genuine linguistic expertise. For many languages
  //  (especially low-resource), this section will remain null until a
  //  qualified researcher or community member contributes.
  // ═══════════════════════════════════════════════════════════════════════

  "linguisticChallenges": null,  // MT-relevant challenges, keyed by challenge ID.
                                 // When populated:
                                 // {
                                 //   "polysynthesis": "Cree is highly polysynthetic. A single verb
                                 //                    can incorporate subject, object, tense...",
                                 //   "animacy": "Verb conjugation changes based on whether the
                                 //              subject/object is animate or inanimate...",
                                 //   "neologisms": "Avoid literal translations of modern software
                                 //                 concepts. Maintain Cree metaphorical logic..."
                                 // }
                                 // Aim for 3–6 challenges per language when researched.

  "contactInfluences": [],       // How other languages have shaped this one. Array of:
                                 // {
                                 //   "source": "English",
                                 //   "sourceIso639_3": "eng",       // null if proto-language/unknown
                                 //   "type": "superstrate",
                                 //       // Enum: "superstrate", "substrate", "adstrate",
                                 //       //       "learned_borrowing", "lexical_borrowing",
                                 //       //       "relexification"
                                 //   "domains": ["education", "government", "technology"],
                                 //   "depth": "deep",
                                 //       // Enum: "light", "moderate", "heavy", "structural",
                                 //       //       "defining"
                                 //   "period": "1870–present",
                                 //   "notes": "Residential school era and ongoing...",
                                 //   "citation_needed": false
                                 //       // true if no published academic source found.
                                 //       // See language-card-citation-procedure.md.
                                 // }

  "rules":          null,        // Typography, plural, and capitalization rules. When populated:
                                 // {
                                 //   "typography": {
                                 //     "quoteStart": "\u201c",
                                 //     "quoteEnd": "\u201d",
                                 //     "usesSpaces": true,        // false for CJK, Thai, Lao, Khmer
                                 //     "punctuationSpacing": {
                                 //       "doublePunctuation": "none"  // "thin-nbsp" for French
                                 //     }
                                 //   },
                                 //   "plurals": {
                                 //     "categories": ["one", "other"]
                                 //       // From CLDR. Possible values:
                                 //       // "zero", "one", "two", "few", "many", "other"
                                 //   },
                                 //   "capitalization": {
                                 //     "hasCase": true
                                 //       // true for Latin, Cyrillic, Greek, Armenian scripts.
                                 //       // false for CJK, Arabic, Devanagari, etc.
                                 //   }
                                 // }
                                 // Source: CLDR + ISO 15924 derivation.

  "typologicalProfile": null,   // Grambank typological features. When populated:
                                // {
                                //   "featuresDocumented": 195,
                                //   "featuresCoverage": 1,     // 0.0–1.0 fraction of features
                                //   "wordOrderDominant": "SVO",
                                //   "hasDefiniteArticle": true,
                                //   "hasIndefiniteArticle": true,
                                //   "hasGenderSystem": true,
                                //   "hasCaseMorphology": true,
                                //   "hasEvidentiality": false,
                                //   "hasToneSystem": false,
                                //   "source": "grambank-1.0.3"
                                // }
                                // Auto-populated by enrich-grambank-typology.mjs.

  "phonologicalInventory": null, // PHOIBLE phoneme inventory. When populated:
                                // {
                                //   "consonants": 24,
                                //   "vowels": 16,
                                //   "tones": 0,
                                //   "totalPhonemes": 40,
                                //   "isTonal": false,
                                //   "inventorySize": "moderately-large",
                                //       // Enum: "small", "moderately-small", "average",
                                //       //       "moderately-large", "large"
                                //   "source": "phoible-2.0"
                                // }
                                // Auto-populated by enrich-phoible-phonemes.mjs.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 8. ENCYCLOPEDIC
  //  General knowledge about the language for human context. History,
  //  dialect situation, institutional resources, representative sayings.
  //  This section is for understanding, not computation.
  // ═══════════════════════════════════════════════════════════════════════

  "encyclopedic":    null,       // General knowledge. When populated:
                                 // {
                                 //   "family": "Algic",             // Redundant with classification
                                 //                                  // but useful for human readers.
                                 //   "dialects": {
                                 //     "split": true,               // Is there significant variation?
                                 //     "classification": "Plains Cree (y-dialect)",
                                 //     "variants": ["crk", "cwd", "csw"]  // ISO codes of variants
                                 //   },
                                 //   "demographics": {
                                 //     "speakers": "Approx. 20,000 active speakers",
                                 //     "regions": ["Saskatchewan", "Alberta", "Manitoba"]
                                 //   },
                                 //   "history": "Plains Cree is the most widely spoken Algonquian
                                 //              language in western Canada...",
                                 //   "resources": {
                                 //     "wikipedia": "https://en.wikipedia.org/wiki/Plains_Cree",
                                 //     "foundations": [{ "name": "ALTLab", "url": "https://..." }],
                                 //     "dictionaries": [{ "name": "itwêwina", "url": "https://..." }]
                                 //   }
                                 // }

  "culturalAphorism": null,      // A representative saying, proverb, or teaching in the language.
                                 // When populated:
                                 // {
                                 //   "text": "ê-wîcêhtonaniwahk kâ-kî-isi-wâpahtamâhk ôma pimâtisiwin",
                                 //   "transliteration": null,       // Romanized form if non-Latin script.
                                 //   "translation": "Through helping each other we come to understand
                                 //                   this life",
                                 //   "literal": "By-helping-one-another we-have-come-to-see this life",
                                 //   "source": "Cree teaching, documented in nêhiyawêwin educational
                                 //              resources"
                                 // }
                                 // Choose sayings that reveal something about the language's
                                 // worldview or structure. Must be sourced.

  "varieties":      [],          // For macrolanguages or languages with significant dialectal
                                 // variation, the individual varieties with their own tool coverage.
                                 // Each entry:
                                 // {
                                 //   "name": "Cusco Quechua",
                                 //   "iso639_3": "quz",
                                 //   "region": "Cusco, Peru",
                                 //   "fstCoverage": true,
                                 //   "corpusCoverage": true,
                                 //   "nllbCoverage": false,
                                 //   "mutualIntelligibility": "Primary variety for this card",
                                 //   "notes": "SQUOIA FST was built for this variety."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 9. DIGITAL RESOURCES & TOOLING
  //  What NLP tools, corpora, models, and datasets exist for this language?
  //  What translation APIs support it? What eval benchmarks are available?
  //
  //  This is Champollion's operational core — these fields determine what
  //  we can actually DO with this language.
  // ═══════════════════════════════════════════════════════════════════════

  "resources":      null,        // NLP resources available for this language. When populated:
                                 // {
                                 //   "fsts": [{                     // Finite-state transducers
                                 //     "name": "GiellaLT Plains Cree FST (lang-crk)",
                                 //     "url": "https://github.com/giellalt/lang-crk/releases",
                                 //     "type": "morphological-analyzer"
                                 //   }],
                                 //   "corpora": [{                  // Text corpora
                                 //     "name": "EDTeKLA Cree Language Textbook Corpus",
                                 //     "type": "parallel",          // "parallel", "monolingual"
                                 //     "pairs": ["en-crk"],
                                 //     "url": "https://...",
                                 //     "exposure": "open-web"       // "open-web", "restricted",
                                 //                                  // "holdout"
                                 //   }],
                                 //   "models": [{                   // Pre-trained models
                                 //     "name": "NLLB-200 (crk_Cans)",
                                 //     "url": "https://...",
                                 //     "type": "nmt"
                                 //   }],
                                 //   "tools": [],                   // Other NLP tools
                                 //   "wordlists": [{                // Standardized wordlists
                                 //     "name": "Lexibank",
                                 //     "conceptCount": 200,
                                 //     "source": "lexibank"
                                 //   }],
                                 //   "treebanks": [{                // Syntactic treebanks
                                 //     "name": "UD_Korean-GSD",
                                 //     "tokens": 80000,
                                 //     "source": "universal-dependencies-2.14"
                                 //   }]
                                 // }
                                 // IMPORTANT: Only actual NLP/digital resources belong here.
                                 // "This language has a WALS entry" is NOT a resource — that
                                 // goes in databaseCoverage.

  "databaseCoverage": null,      // Which typological/reference databases cover this language.
                                 // Separated from resources to avoid conflating "has a database
                                 // entry" with "has usable NLP tooling."
                                 // {
                                 //   "wals": true,
                                 //   "grambank": true,
                                 //   "phoible": true,
                                 //   "cldr": true,
                                 //   "lexibank": true,
                                 //   "commonVoice": true,
                                 //   "source": "derived"
                                 // }

  "corpusAvailability": null,    // What text/parallel corpora exist for NLP use?
                                 // {
                                 //   "bibleTranslation": {
                                 //     "textAvailable": true,
                                 //     "audioAvailable": true,
                                 //     "source": "bible-brain-api"
                                 //   },
                                 //   "opusCorpora": ["wikimedia", "ubuntu", "gnome"],
                                 //   "source": "multi-source"
                                 // }

  "keyboardSupport":  null,      // Input method / keyboard availability. When populated:
                                 // {
                                 //   "keymanKeyboards": 3,
                                 //       // Number of Keyman keyboards available.
                                 //   "cldrKeyboard": true,
                                 //       // CLDR has keyboard layout data.
                                 //   "source": "keyman-api + cldr"
                                 // }

  "methodSupport":  {            // REQUIRED. Which Champollion translation methods support this
                                 // language. Each method is an object with at minimum
                                 // { "supported": boolean }.
    "googleTranslate":     { "supported": false },
    "deepl":               { "supported": false },
    "microsoftTranslator": { "supported": false },
    "libreTranslate":      { "supported": false },
    "nllb":                { "supported": false },
                                 // When NLLB is supported, include the code:
                                 // { "supported": true, "code": "crk_Cans" }
    "llm":                 { "supported": true }
                                 // LLM is always true (quality varies by language).
                                 // Optional: "verifiedDate": "2026-06-07" for audit trail.
  },

  "metricModelSupport": null,   // Which MT evaluation models produce reliable scores.
                                // When populated:
                                // {
                                //   "xlmr": "high",          // "high", "medium", or "low"
                                //                            // XLM-R training representation tier.
                                //   "africomet": false        // true if AfriCOMET covers this language.
                                // }
                                // Drives automatic COMET model selection in metrics_comet.py.
                                // Auto-populated by enrich-metric-model-support.mjs.

  "metricPlugins":   null,      // Which per-language metric plugin packs are available.
                                // When populated:
                                // {
                                //   "formalityMarkers": true  // Formality marker resource file exists
                                //                             // at plugins/resources/formality/{code}.json
                                // }
                                // Each key corresponds to a resource pack in
                                // arena/mt_eval_harness/plugins/resources/{packName}/.
                                // To add a new metric pack for a language, create the resource
                                // file and set the flag here. No code changes required.

  "evalPack":       null,        // Evaluation dependency pack for language-specific metrics.
                                 // When populated, declares the Python dependencies and
                                 // post-install steps required by this language's eval standards.
                                 // The harness uses this for dependency gating: if deps are
                                 // missing, the harness warns the user and skips LYSS metrics
                                 // (rather than crashing).
                                 // When populated:
                                 // {
                                 //   "pythonDeps": {
                                 //     "pyhfst": "pyhfst>=1.4",    // PyPI package specs
                                 //     "requests": "requests>=2.28",
                                 //     "spacy": "spacy>=3.7"
                                 //   },
                                 //   "postInstall": [               // Commands to run after pip
                                 //     {
                                 //       "command": "spacy download en_core_web_md",
                                 //       "label": "spaCy English model (for LYSS-sem)"
                                 //     }
                                 //   ],
                                 //   "requiresFst": true,           // true if GiellaLT FST needed
                                 //   "description": "LYSS equivalence linter + FST validation"
                                 // }

  "evalMetrics":    null,        // Language-specific evaluation metrics (LYSS standards).
                                 // When populated, the harness dynamically imports these
                                 // MetricPlugin classes from eval_standards/<lang>/ and applies
                                 // them to every run targeting this language — regardless of
                                 // which method (contestant) is being evaluated.
                                 // Keyed by metric ID:
                                 // {
                                 //   "lyss-eq": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkLinterMetric",
                                 //     "description": "LYSS deterministic variant-class linter"
                                 //   },
                                 //   "lyss-sem": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkSemanticMetric",
                                 //     "description": "LYSS FST-based semantic validator",
                                 //     "dependencies": ["spacy>=3.7"],
                                 //     "spacy_models": ["en_core_web_md"]
                                 //   }
                                 // }
                                 // Architecture: eval standards are referees, not contestants.
                                 // They live in the harness (eval_standards/), not in method
                                 // plugins. This ensures all methods are scored equally.
                                 // Discovery: plugin_discovery.py reads this field via
                                 // language_cards.get_eval_metrics() and instantiates metrics
                                 // using importlib. Dependencies are checked against evalPack.

  "omt1600":        null,        // Meta's OMT-1600 (One Model for Translation) coverage assessment.
                                 // When populated:
                                 // {
                                 //   "covered": true,
                                 //   "tier": "R1",                  // Meta's resource tier
                                 //   "evalMetrics": ["chrF++", "BLASER-3"],
                                 //   "notes": "Plains Cree: no web-crawled bitext..."
                                 // }

  "evalDatasets":   [],          // Evaluation dataset IDs available for this language.
                                 // Example: ["flores-plus-devtest", "edtekla-dev-v1"].
                                 // Empty means no standardized eval set exists.

  "pipelineReadiness": null,     // Assessment of readiness for Champollion's translation pipeline.
                                 // When populated:
                                 // {
                                 //   "tier": "tier-2-feasible",
                                 //       // "watch-list"       — cataloged but no path to translation
                                 //       // "tier-3-cataloged" — basic metadata present
                                 //       // "tier-2-feasible"  — tools exist, pipeline possible
                                 //       // "tier-1-ready"     — pipeline operational
                                 //   "hasFST": true,
                                 //   "hasParallelCorpus": true,
                                 //   "hasEvalBenchmark": true,
                                 //   "blockers": ["Syllabics post-processing validation"],
                                 //   "notes": "FST-gated pipeline operational. EDTeKLA corpus..."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 10. PROVENANCE & METADATA
  //  Where does this data come from? Who reviewed it? When was it
  //  generated? What's its overall quality level?
  //
  //  This section exists to make the card auditable. Every automated
  //  enrichment, every human review, every source consulted should
  //  leave a trace here.
  // ═══════════════════════════════════════════════════════════════════════

  "dataSources":   [],           // REQUIRED. Sources consulted for this card's data.
                                 // Can be a flat array (backwards-compatible):
                                 //   ["iso639-3-2024", "glottolog-5.3", "wikidata"]
                                 //
                                 // Or a structured per-field object (preferred for new cards):
                                 //   {
                                 //     "classification": ["glottolog-5.3"],
                                 //     "vitality": ["glottolog-aes-5.3", "unesco-atlas-2024"],
                                 //     "speakerEstimates": ["wikidata", "census-ca-2021"],
                                 //     "rules": ["cldr-48"],
                                 //     "methodSupport": ["google-translate-2026-06"]
                                 //   }

  "supportTier":   "cataloged",  // Auto-derived tier summarizing the card's depth:
                                 //   "cataloged"   — identity + classification only
                                 //   "emerging"    — + vitality + speakerEstimates
                                 //   "developing"  — + resources + methodSupport
                                 //   "supported"   — full research: registers, challenges, etc.

  "humanReviewed": null,         // null until a qualified human reviews the card. When populated:
                                 // {
                                 //   "reviewer": "Prof. Kenneth Jamandre",
                                 //   "affiliation": "University of the Philippines Diliman",
                                 //   "date": "2026-06-08",
                                 //   "scope": "full",             // "full", "partial", "vitality-only"
                                 //   "notes": "Verified speaker count, vitality assessment,
                                 //             and contact influences for Tagalog."
                                 // }

  "notes":         null,         // Free-text notes about this language or this card's data quality.
                                 // Example: "Low-resource language under active development.
                                 //           Translation pipeline uses FST-gated approach."

  "firstDocumented": null,       // Year of first known documentation. Negative for BCE.
                                 // Example: -1500 (Sanskrit, ~1500 BCE), 1787 (some languages).
                                 // Source: Glottolog CLDF.

  "lastDocumented":  null,       // Year of last known documentation (relevant for extinct languages).
                                 // Source: Glottolog CLDF.

  "_generated":    null          // Auto-populated by enrichment scripts. When populated:
                                 // {
                                 //   "by": "generate-all-cards.mjs",
                                 //   "at": "2026-06-07T12:34:56Z",
                                 //   "sources": ["iso639-3", "glottolog-5.3", "wikidata"],
                                 //   "completeness": "partial",
                                 //       // "partial"     — has identity + classification + coords
                                 //       // "substantial" — + vitality + speakerEstimates + script
                                 //       // "complete"    — all automatable fields populated
                                 //   "lastEnriched": "2026-06-07"
                                 // }
}

字段参考

§ 1. 身份字段

字段	类型	必需	可自动化	来源
`code`	`string`	✅	✅	ISO 639-3 注册表
`name`	`string`	✅	✅	ISO 639-3 注册表
`nativeName`	`string \| null`	—	✅	Wikidata P1705
`alternateNames`	`string[]`	—	✅	Glottolog、Ethnologue
`iso639_3`	`string`	✅	✅	ISO 639-3 注册表
`iso639_1`	`string \| null`	—	✅	ISO 639-1
`bcp47`	`string \| null`	—	部分	IANA 子标签注册表
`aliases`	`string[]`	—	❌	手动策划
`isoScope`	`string`	✅	✅	ISO 639-3 注册表
`isoType`	`string`	✅	✅	ISO 639-3 注册表
`macrolanguage`	`string \| null`	—	✅	ISO 639-3 macrolanguages.tab
`extends`	`string \| null`	—	❌	手动策划

§ 2. 分类字段

字段	类型	必需	可自动化	来源
`glottocode`	`string \| null`	—	✅	Glottolog
`classification`	`object \| null`	—	✅	Glottolog
`isIsolate`	`boolean`	—	✅	Glottolog CLDF

§ 3. 地理字段

字段	类型	必需	可自动化	来源
`macroarea`	`string \| null`	—	✅	Glottolog CLDF
`coordinates`	`object \| null`	—	✅	Glottolog
`countries`	`string[]`	—	✅	Glottolog
`regions`	`object[]`	—	❌	人口普查、Ethnologue、手动
`arealContext`	`object \| null`	—	✅	坐标 + 语言学区域区域

§ 4. 书写系统字段

字段	类型	必需	可自动化	来源
`script`	`string \| null`	—	✅	Wikidata P282
`scriptUnicodeName`	`string \| null`	—	✅	从 `script` 通过 ISO 15924 → Unicode 映射派生
`scripts`	`object[]`	—	部分	Wikidata、手动
`dir`	`string \| null`	—	✅	从脚本派生
`scriptConverter`	`string \| null`	—	❌	手动
`orthographicStatus`	`object \| null`	—	部分	Ethnologue、手动

§ 5. 人口统计与活力字段

字段	类型	必需	可自动化	来源
`speakerEstimates`	`object[]`	—	✅	Wikidata、Ethnologue、人口普查
`vitality`	`object \| null`	—	✅	Glottolog AES、UNESCO

§ 5.5 文档与数字存在字段

字段	类型	必需	可自动化	来源
`documentationDepth`	`object \| null`	—	✅	Glottolog 参考文献
`digitalPresence`	`object \| null`	—	✅	Wikipedia、Common Voice、Tatoeba
`dialectCount`	`number \| null`	—	✅	Glottolog

§ 6. 正式性、寄存器与性别字段

字段	类型	必需	可自动化	来源
`formality`	`object \| null`	—	❌	语言学研究
`registers`	`object \| null`	—	❌	语言学研究
`gender`	`object \| null`	—	❌	语言学研究
`codeSwitching`	`object \| null`	—	❌	语言学研究

§ 7. 语言学档案字段

字段	类型	必需	可自动化	来源
`linguisticChallenges`	`object \| null`	—	❌	语言学研究
`contactInfluences`	`object[]`	—	❌	已发表的语言学
`rules`	`object \| null`	—	✅	CLDR
`typologicalProfile`	`object \| null`	—	✅	Grambank 1.0.3 — 由 `enrich-grambank-typology.mjs` 自动填充
`phonologicalInventory`	`object \| null`	—	✅	PHOIBLE 2.0 — 由 `enrich-phoible-phonemes.mjs` 自动填充

§ 8. 百科字段

字段	类型	必需	可自动化	来源
`encyclopedic`	`object \| null`	—	❌	手动研究
`culturalAphorism`	`object \| null`	—	❌	社区贡献
`varieties`	`object[]`	—	❌	手动研究

§ 9. 数字资源字段

字段	类型	必需	可自动化	来源
`resources`	`object \| null`	—	部分	手动 + 自动化
`databaseCoverage`	`object \| null`	—	✅	从富化派生
`corpusAvailability`	`object \| null`	—	✅	Bible Brain、OPUS、Lexibank
`keyboardSupport`	`object \| null`	—	✅	Keyman API、CLDR
`methodSupport`	`object`	✅	部分	API 验证
`metricModelSupport`	`object \| null`	—	✅	XLM-R 论文、AfriCOMET 论文
`metricPlugins`	`object \| null`	—	✅	卡富化——声明哪些指标插件包适用（例如 `{ formalityMarkers: true }`）
`omt1600`	`object \| null`	—	✅	元评估
`evalDatasets`	`string[]`	—	✅	数据集注册表
`pipelineReadiness`	`object \| null`	—	部分	派生 + 手动

resources.fsts[].install：resources 对象中的 FST 条目可以包含一个 install 子对象，其字段为：repo、releaseTag、assetPattern、format、maturity，以及可选的 bundlePattern。这替代了以前的 GIELLALT_FST_REGISTRY 硬编码字典。参见 get_fst_install_info() 在 language_cards.py 中。

§ 10. 出处字段

字段	类型	必需	可自动化	来源
`dataSources`	`array \| object`	✅	✅	自动 + 手动
`supportTier`	`string`	—	✅	从卡完整性派生
`humanReviewed`	`object \| null`	—	❌	人工审查者
`notes`	`string \| null`	—	❌	手动
`firstDocumented`	`number \| null`	—	✅	Glottolog CLDF
`lastDocumented`	`number \| null`	—	✅	Glottolog CLDF
`_generated`	`object \| null`	—	✅	富化脚本

语言代码政策

Champollion 使用 ISO 639-3 作为规范标识符。其他标准代码注册为别名，在运行时解析为 ISO 639-3 代码。

优先级	标准	示例	字段	用途
1（规范）	ISO 639-3	`crk`	`code`	卡文件名、配置键、API 参数
2（别名）	ISO 639-1	`iu`	`aliases[]`	在 CLI 中接受，解析为 ISO 639-3
3（别名）	BCP 47	`fil`	`aliases[]`	在 CLI 中接受，解析为 ISO 639-3
参考	Glottocode	`plai1258`	`glottocode`	仅分类，不用于运行时

解析顺序： 当用户提供代码时：

card.code 上的直接匹配 → 找到
card.aliases[] 上的匹配 → 找到，返回规范卡
card.iso639_1 上的匹配 → 找到（备选）
未找到 → 错误

迁移历史：ISO 639-1 → ISO 639-3

在 v8 之前，卡文件名在可用时使用 ISO 639-1 代码（fr.json、de.json、ja.json）。在 639-3 迁移中，所有卡都重命名为其 ISO 639-3 等价物：

之前	之后	原因
`fr.json`	`fra.json`	639-3 是规范
`de.json`	`deu.json`	639-3 是规范
`zh.json`	`cmn.json`	宏语言 → 默认个体
`ar.json`	`arb.json`	宏语言 → 现代标准阿拉伯语
`ms.json`	`zsm.json`	宏语言 → 标准马来语

旧代码发生了什么？

旧的 639-1 代码在 card.iso639_1 中
旧的 639-1 代码在 card.aliases[] 中
resolveCode("fr") 在运行时返回 "fra" — 向后兼容
用户仍然可以在配置中写 "fr" — 它透明地解析

架构上改变了什么：

_deepMerge() 现在跳过 null 值（从父继承）
_deepMerge() 现在设置了身份字段（代码、扩展、别名永不继承）
formality.default 现在从寄存器 isDefault: true 标志派生
205 个 Grambank 派生的卡获得了结构 formality.default 修复
38 个属/族/宏语言卡提供继承目标

边界情况

手语

手语（例如 ASE——美国手语）是具有 ISO 639-3 代码的合法语言。它们有地理和使用者数量，但：

script 通常为 null（无标准书面形式）
scripts 可能包括 "Sgnw"（SignWriting）如果使用了符号系统
dir 为 null
linguisticChallenges 应该处理空间语法、分类器等
gender.grammatical 通常为 false

古代与历史语言

拉丁语（lat，isoType H）和梵语（san，isoType H）等语言仍在特定背景下使用（礼仪、学术），但没有本地使用者：

vitality 可能注明"无本地使用者"，带 "trend": "stable"（不衰退——使用它的社区稳定，只是很小）
speakerEstimates 应该注明这些是 L2 使用者，不是 L1
firstDocumented / lastDocumented 在时间上定位它们

构造语言

世界语（epo，isoType C）、逻辑语等：

classification 可能指向"构造"族或 null
contactInfluences 反映源材料（例如，世界语借鉴罗曼、日耳曼、斯拉夫语）
vitality 不寻常——使用者社区增长但无本地家园

宏语言

阿拉伯语（ara）、汉语（zho）、Cree（cre）、Quechua（que）是包含多种个体语言的宏语言：

isoScope: "M"
varieties 应该列出个体语言及其 ISO 代码
methodSupport 应该反映宏语言卡支持的内容（通常是标准化变体）
个体变体也应该有自己的卡

无标准化正字法的语言

许多语言（特别是口头传统语言）没有标准化的书写系统，或有竞争的正字法：

script 为 null
scripts 为 []
dir 为 null
notes 应该解释正字法情况
linguisticChallenges 应该注明这如何影响 MT（例如，无训练数据）

双言现象

阿拉伯语（MSA 对方言）或 Guaraní（Jopará 对纯 Guaraní）等语言：

codeSwitching 捕捉混合变体情况
registers 可以为不同级别提供预设
varieties 可以列出双言对

接触影响类型

类型	含义	示例
`superstrate`	强加给社区的主导语言	法语 → 英语（1066 年后）
`substrate`	本地语言影响强加的语言	凯尔特语 → 英语
`adstrate`	相邻语言有相互影响	诺斯语 → 英语
`learned_borrowing`	通过教育/学术借用	拉丁语 → 英语
`lexical_borrowing`	通过接触直接词汇借用	西班牙语 → 菲律宾语
`relexification`	大规模词汇替换	葡萄牙语 → 帕皮亚门图语

接触影响深度

深度	含义
`light`	少数借词，最小结构影响
`moderate`	特定领域的重要词汇
`heavy`	普遍的词汇和一些结构特征
`structural`	语法、句法和音韵受影响
`defining`	核心身份由接触塑造（克里奥尔语、混合语言）

编写好的寄存器预设

好的预设提示：

明确命名正式性特征（例如"해요체"、"vous-form"、"siz-form"）
解释要使用的特定代词或动词形式
给出这个寄存器何时适当的背景
如果适用，提及脚本考虑

不要在预设提示中放置性别包容性指导。性别指导属于 card.gender.inclusiveGuidance ——它单独注入。

❌ Bad:  "Standard Thai. Professional register."
✔ Good: "Professional Thai. Use คุณ (khun) for second person, เรา (rao)
         for first person when needed. Clear, concise phrasing
         appropriate for digital interfaces."

预设命名约定

预设键应该是描述性的、小写连字符分隔的：

T-V 语言：formal-vous、informal-tu、formal-Sie、casual-du
言语级别：polite-haeyo、formal-hapsyo、casual-hae
中立：professional、neutral-professional
代码转换：taglish-professional、pure-filipino

富化程序

每卡处理顺序

富化卡时，按此顺序查阅来源。记录每个查阅的来源，即使它没有返回数据。

ISO 639-3 注册表 → code、name、isoScope、isoType
ISO 639-3 macrolanguages.tab → macrolanguage
Glottolog languoid.csv → glottocode、classification、coordinates、countries
Glottolog CLDF → macroarea、isIsolate、firstDocumented、lastDocumented
Glottolog AES → vitality（濒危状态）
Wikidata SPARQL → nativeName、speakerEstimates、script、scripts、dir
CLDR → rules（排版、复数、大小写）
NLLB-200 / FLORES+ → methodSupport.nllb、evalDatasets
API 验证 → 剩余 methodSupport 条目
ML 模型论文 → metricModelSupport（XLM-R 训练数据、AfriCOMET 覆盖）脚本：node scripts/enrich-metric-model-support.mjs

冲突处理

当来源不一致时：

存储两者并标注来源
不平均或选边
注明分歧在相关 note 字段中
仅当需要单一值进行计算时，优先最近的主要来源

验证

在任何富化或手动编辑后运行 linter：

node scripts/lint-language-cards.mjs              # all cards
node scripts/lint-language-cards.mjs --lang crk    # single card

PR 检查清单

提交新的或修改的语言卡时：

文件命名为 <code>.json 在 shared/language-cards/ 中
规范模板中的所有顶级字段都存在
classification 从 Glottolog 填充（不是手工构建）
dataSources 列出所有查阅的来源
methodSupport 条目针对实际 API 语言列表验证
contactInfluences 条目有已发表的来源或 citation_needed: true
linguisticChallenges 有 3–6 个 MT 相关挑战（如果研究过）
rules 从 CLDR 填充（如果存在区域设置数据）
Linter 通过无错误

专业参考

标准	维护者	我们的用途
ISO 639-3	SIL International	规范语言代码、宏语言关系
Glottolog	Max Planck Institute	分类、坐标、AES 濒危
WALS	Max Planck Institute	属定义、类型特征
ISO 15924	Unicode/ISO	脚本代码
CLDR	Unicode Consortium	区域设置数据、复数规则、排版
Wikidata	Wikimedia Foundation	使用者数量、内族名、脚本数据
Ethnologue	SIL International	EGIDS、使用者估计、DLS
UNESCO Atlas	UNESCO	濒危分类
Katig Collective	UP Diliman	菲律宾语言胶囊

另见：语言卡引用程序以获取详细的逐来源指导。

设计原则​

三层架构​

继承模型​

合并语义​

身份字段（永不继承）​

示例：Cree 卡如何解析​

属卡模板​

规范模板​

字段参考​

§ 1. 身份字段​

§ 2. 分类字段​

§ 3. 地理字段​

§ 4. 书写系统字段​

§ 5. 人口统计与活力字段​

§ 5.5 文档与数字存在字段​

§ 6. 正式性、寄存器与性别字段​

§ 7. 语言学档案字段​

§ 8. 百科字段​

§ 9. 数字资源字段​

§ 10. 出处字段​

语言代码政策​

迁移历史：ISO 639-1 → ISO 639-3​

边界情况​

手语​

古代与历史语言​

构造语言​

宏语言​

无标准化正字法的语言​

双言现象​

接触影响类型​

接触影响深度​

编写好的寄存器预设​

预设命名约定​

富化程序​

每卡处理顺序​

冲突处理​

验证​

PR 检查清单​

专业参考​