Espesipikasyon ng Language Card

Nag-iisang source of truth. Itinatakda ng dokumentong ito ang canonical na hugis ng bawat language card. Ang bawat card ay DAPAT maglaman ng bawat top-level field na nakalista dito, kahit kapag ang value ay null o []. Ang card na may nawawalang field ay hindi sumusunod sa pamantayan. Ang pagkakaparehong ito ang nagbibigay-daan sa automated tools, linters, enrichment scripts, at mga human reviewer na pagkatiwalaan ang istruktura ng card.

Mga Prinsipyo ng Disenyo

Pare-parehong hugis. Lahat ng 8,000+ card ay may parehong top-level fields. Ang mga hindi alam na value ay null, ang mga empty array ay [], ang mga empty object ay null (hindi {}). Ibig sabihin nito, hindi na kailangang suriin ng code kung "umiiral ba ang field na ito?" — tanging "may laman ba ito?"
Lagyan ng source ang lahat. Ang bawat factual claim ay dapat masubaybayan sa isang pinangalanan, naka-version, primary source. Ang mga claim na walang source ay mga claim na hindi mabe-verify. Ginagawang malinaw ng dataSources field (at ng per-field source annotations sa mga sub-object) ang provenance.
Panatilihin ang hindi pagkakasundo. Kapag hindi nagkakasundo ang mga authority (sinasabi ng Wikidata na 50,000 speaker, sinasabi ng Ethnologue na 20,000), iniimbak natin ang pareho na may source attribution. Hindi tayo nag-a-average, nagre-resolve, o pumipili ng panig. Maaaring mag-navigate ang mga user sa nuance.
Ang null ay nangangahulugang unknown, hindi inapplicable. Kung ang isang field ay null, ibig sabihin nito "hindi pa tayo nakakahanap ng data para dito." Kung talagang hindi naaangkop ang isang field (hal., grammatical gender para sa isang sign language), dapat ipaliwanag ng value iyon: { "grammatical": false, "inclusiveGuidance": "Hindi naaangkop — ang ASL ay walang grammatical gender." }
Merge lamang. Nagdaragdag ng data ang enrichment scripts, hindi kailanman nag-o-overwrite. Ang mga human-curated value ay may priority kaysa sa automated data.

Three-Layer Architecture

Layer	Lokasyon	Layunin
Language cards	`shared/language-cards/<code>.json`	Per-language configuration: identity, classification, resources, lahat
Genus cards	`shared/language-cards/genera/<genus>.json`	Shared runtime properties para sa magkakaugnay na wika (curated, hindi auto-generated)
Language tree	`shared/language-cards/language-tree.json`	Buong Glottolog hierarchy — reference data para sa Lab UI at language discovery

Inheritance Model

Kapag nag-set ang isang card ng "extends": "family-dravidian", imi-merge ng runtime ang parent card papunta sa child gamit ang _deepMerge() (sa lib/registers.js). Nagbibigay-daan ito sa genus cards na magtakda ng shared registers, formality systems, at gender guidance na dumadaloy pababa sa lahat ng member languages — nang hindi inuulit ang data sa daan-daang indibidwal na card.

Merge Semantics

Child value	Behavior	Bakit
`null`	Mag-inherit mula sa parent	Ang `null` ay nangangahulugang "hindi ko ito dine-define" — dumadaloy ang value ng parent
Non-null	I-override ang parent	Mas espesipiko ang data ng child — ito ang may priority
Nested object	Recursive merge	Nag-o-override ang child fields, napapanatili ang parent fields
Array	Palitan nang buo	Hindi nagme-merge ang arrays item-by-item — ang child array ang mananaig

Identity Fields (Hindi Kailanman Ini-inherit)

May ilang field na pag-aari mismo ng card at HINDI DAPAT kailanman i-inherit mula sa parent:

code, extends, _migration, aliases, iso639_1, iso639_3

Kahit mag-define ang parent card ng aliases: ["macro-code"], HINDI i-i-inherit ng child card ang mga alias na iyon. Ang mga field na ito ay palaging sariling values ng child (kasama ang null kung hindi naka-set).

Bakit: Kung wala ang rule na ito, ii-inherit ng bawat Cree language ang aliases: ["cre"] mula sa macrolanguage parent, kaya magiging alias ng macro ang bawat variety.

Halimbawa: Paano Nire-resolve ang isang Cree Card

┌───────────────────────┐
│  family-algic.json    │  formality: null, registers: null
│  (no registers)       │
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  genus-cree.json      │  formality: { system: "obviative-animate", ... }
│  (sourced registers)  │  registers: { formal: {...}, informal: {...} }
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  crk.json             │  code: "crk", extends: "genus-cree"
│  (Plains Cree)        │  formality: null → inherits from genus-cree
│                       │  registers: null → inherits from genus-cree
│                       │  script: "Cans"  → own value, no inheritance
│                       │  code: "crk"     → identity field, never inherited
└───────────────────────┘

Sa runtime, nagbabalik ang getLanguageCard("crk") ng merged object na may mga register ng genus-cree + properties ng family-algic (kung mayroon) + sariling identity at metadata ng crk.

Genus Card Template

Nakatira ang genus cards sa shared/language-cards/genera/ at nagde-define ng shared properties para sa isang language group. Sinusunod nila ang parehong schema gaya ng regular cards ngunit may ibang conventions:

{
  // Identity — genus cards use a prefixed code, NOT an ISO 639-3 code
  "code": "genus-cree",           // "genus-", "family-", or "macrolanguage-" prefix
  "name": "Cree Languages",      // Human-readable group name
  "extends": "family-algic",     // Genus cards can extend family cards (chaining)

  // Formality — shared across the group, sourced from typological databases
  "formality": {
    "system": "obviative-animate",
    "description": "Cree languages use an obviative/proximate system...",
    "default": "formal",
    "source": "WALS 37A, 38A + Wolfart 1973"
  },

  // Registers — shared presets, if the group shares a formality system
  "registers": {
    "formal": {
      "label": "Formal (Proximate)",
      "description": "...",
      "prompt": "...",
      "isDefault": true
    },
    "informal": {
      "label": "Informal",
      "description": "...",
      "prompt": "..."
    }
  },

  // Gender — shared grammatical gender behavior
  "gender": {
    "grammatical": false,       // Cree doesn't have grammatical gender
    "inclusiveGuidance": null   //   so no inclusive guidance needed
  },

  // Everything else is null — individual cards provide their own
  // classification, geography, resources, etc.
  "classification": null,
  "methodSupport": null,
  // ...
}

Pangunahing rule: Ang genus cards ay DAPAT LAMANG maglaman ng data na tunay na shared sa buong group at may source mula sa authoritative references. Kung nag-iiba ang isang formality system sa pagitan ng mga member, kabilang ito sa individual cards, hindi sa genus.

Canonical Template

Ang bawat card ay DAPAT magkaroon ng eksaktong top-level shape na ito. Ang sub-object schemas ay nakadokumento sa Field Reference sa ibaba.

{
  // ═══════════════════════════════════════════════════════════════════════
  //  § 1. IDENTITY
  //  Who is this language? What codes identify it?
  //  Sources: ISO 639-3 registry, ISO 639-1, BCP 47/IANA.
  // ═══════════════════════════════════════════════════════════════════════

  "code":          "xxx",       // REQUIRED. ISO 639-3 code. This IS the card ID and filename.
  "name":          "English Name",  // REQUIRED. English reference name from ISO 639-3 registry.
  "nativeName":    null,        // Endonym (name in the language itself). Source: Wikidata P1705.
                                // Examples: "nêhiyawêwin / ᓀᐦᐃᔭᐍᐏᐣ", "日本語", "Esperanto".
  "alternateNames": [],         // Other names this language is known by. Source: Glottolog, Ethnologue.
                                // Not aliases (those are code-level). These are name-level variants.
                                // Example: ["Qafar af", "Afaraf", "'Afar Af"] for Afar (aar).
  "iso639_3":      "xxx",      // REQUIRED. Three-letter ISO 639-3 code. Same as `code`.
  "iso639_1":      null,        // Two-letter ISO 639-1 code (e.g., "en", "fr"). null if none.
  "bcp47":         null,        // IETF BCP 47 tag. Often same as iso639_1. Can include subtags
                                // (e.g., "iu-Cans-CA"). null if unknown.
  "aliases":       [],          // Alternative code-level identifiers that resolve to this card.
                                // Example: ["fil"] for tl (Tagalog), ["iu"] for iku (Inuktitut).
                                // Used by code resolution: user types "fil", system loads tl.json.
  "isoScope":      "I",        // REQUIRED. ISO 639-3 scope:
                                //   "I" = Individual language
                                //   "M" = Macrolanguage (e.g., Chinese, Arabic, Cree)
                                //   "S" = Special (e.g., mis, mul, zxx)
  "isoType":       "L",        // REQUIRED. ISO 639-3 type:
                                //   "L" = Living    "E" = Extinct    "A" = Ancient
                                //   "H" = Historical    "C" = Constructed
  "macrolanguage": null,        // If this language is part of a macrolanguage, the macrolanguage
                                // ISO 639-3 code (e.g., "cre" for Plains Cree, "ara" for Arabic
                                // varieties). Source: ISO 639-3 macrolanguages.tab.
  "extends":       null,        // Genus card key if shared properties are inherited from a genus
                                // card (e.g., "genus-cree", "genus-eskimo-aleut").
                                // null for most languages.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 2. CLASSIFICATION
  //  Where does this language sit in the family tree?
  //  Source: Glottolog. NEVER hand-build classifications.
  // ═══════════════════════════════════════════════════════════════════════

  "glottocode":      null,      // Glottolog identifier (e.g., "plai1258", "stan1293").
                                // null if the language is not in Glottolog.
  "classification":  null,      // Genealogical classification from Glottolog. When populated:
                                // {
                                //   "family": "Algic",              // Top-level family. null for isolates.
                                //   "familyGlottocode": "algi1248", // Glottocode of the family.
                                //   "genus": "Plains Creeic",       // WALS-style genus.
                                //   "genusGlottocode": "plai1264",  // Glottocode of the genus.
                                //   "ancestry": ["Algic", "Algonquian-Blackfoot", "Algonquian",
                                //                "Cree-Montagnais-Naskapi", "Cree", "Plains Creeic"]
                                // }
                                // For isolates: family = language name, genus = language name,
                                // ancestry = [language name].
  "isIsolate":       false,     // true if a language isolate (no known genetic relatives).
                                // Source: Glottolog CLDF.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 3. GEOGRAPHY
  //  Where is this language spoken?
  //  Sources: Glottolog (coordinates, countries), census data, Ethnologue.
  // ═══════════════════════════════════════════════════════════════════════

  "macroarea":     null,        // Glottolog macroarea. One of: "Africa", "Australia",
                                // "Eurasia", "North America", "Papunesia", "South America".
                                // null if unknown. Source: Glottolog CLDF.
  "coordinates":   null,        // Representative geographic point. When populated:
                                // { "lat": 52.1, "lng": -106.6, "source": "glottolog-5.3" }
                                // This is a representative point, not a boundary.
  "countries":     [],          // ISO 3166-1 alpha-2 country codes where this language is spoken.
                                // Example: ["CA", "US"]. Source: Glottolog.
  "regions":       [],          // Detailed regional breakdown with admin codes & speaker estimates.
                                // Each entry:
                                // {
                                //   "country": "Canada",
                                //   "countryCode": "CA",
                                //   "officialStatus": "recognized",  // official, co-official,
                                //                                    // recognized, none
                                //   "region": "Saskatchewan, Alberta, Manitoba",
                                //   "speakerEstimate": "~20,000",
                                //   "coordinates": [-106.6, 52.1],   // [lng, lat]
                                //   "admin1Codes": ["CA-SK", "CA-AB", "CA-MB"]
                                // }

  "arealContext":  null,         // Linguistic area / Sprachbund membership. DISTINCT from
                                // contactInfluences (which is language-specific contact history).
                                // This field captures zone-level typological convergence patterns
                                // — i.e., what linguistic area the language exists within and
                                // what features are common across that area.
                                // {
                                //   "zone": "Mainland Southeast Asian Sprachbund",
                                //   "arealFeatures": "Tonal convergence, classifier systems,
                                //     topic-prominence, monosyllabicity trend.",
                                //   "typicalContacts": ["Classical Chinese", "Sanskrit/Pali"],
                                //   "source": "areal-linguistics (Enfield 2005)"
                                // }
                                // NOT the same as contactInfluences. A language can exist within
                                // a convergence area without having specific contact history with
                                // any particular language in that area.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 4. WRITING SYSTEMS
  //  How is this language written?
  //  Sources: Wikidata P282, ISO 15924, manual research.
  //  Note: Some languages have NO standardized orthography. Some have
  //  competing orthographies. Some use multiple scripts routinely (e.g.,
  //  Serbian: Cyrillic + Latin; Japanese: Kanji + Hiragana + Katakana).
  //  Sign languages may use notation systems (SignWriting, HamNoSys) or
  //  none at all.
  // ═══════════════════════════════════════════════════════════════════════

  "script":        null,        // Primary ISO 15924 script code (e.g., "Latn", "Cyrl", "Cans",
                                // "Jpan"). null if no written form or unknown.
  "scriptUnicodeName": null,    // Unicode script block name derived from the script field.
                                // e.g., "Latin", "Cyrillic", "Canadian_Aboriginal", "CJK".
                                // Used by code_switching metric plugin. Auto-populated by
                                // enrich-script-unicode-names.mjs. null if script is null.
  "scripts":       [],          // All writing systems with detail. Array of:
                                // {
                                //   "code": "Cans",
                                //   "name": "Unified Canadian Aboriginal Syllabics",
                                //   "primary": true
                                // }
                                // A language with multiple scripts has multiple entries.
                                // A language with no written form has [].
  "dir":           null,        // Writing direction: "ltr" (left-to-right) or "rtl" (right-to-left).
                                // null if no written form or unknown.
  "scriptConverter": null,      // Script converter key if we have a converter for this language
                                // (e.g., "crk" for SRO↔Syllabics). null for most languages.
  "orthographicStatus": null,   // Writing system standardization status. When populated:
                                // {
                                //   "status": "standardized",
                                //       // "standardized" — official/agreed orthography exists
                                //       // "competing"    — multiple orthographies in active use
                                //       // "emerging"     — orthography under development
                                //       // "none"         — primarily oral, no standard writing
                                //   "notes": "Uses SIL-developed Latin orthography since 1960s.",
                                //   "source": "ethnologue" // or "manual-curation"
                                // }
                                // Crucial for LRLs where orthographic variation directly impacts
                                // MT training data quality and evaluation consistency.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5. DEMOGRAPHICS & VITALITY
  //  How many people speak this language? Is it endangered?
  //  Sources: Census, Ethnologue, UNESCO Atlas, Wikidata, Glottolog AES.
  //
  //  CRITICAL: Store ALL estimates separately with source attribution.
  //  Never average or "resolve" conflicting data. Speaker counts are
  //  politically contested for many languages. Present the evidence,
  //  let the reader assess.
  // ═══════════════════════════════════════════════════════════════════════

  "speakerEstimates": [],       // Array of speaker count estimates from different authorities.
                                // Each entry:
                                // {
                                //   "source": "wikidata",              // or "ethnologue-28",
                                //                                      // "census-ph-2020", etc.
                                //   "count": 20000,                    // Point estimate. null if range-only.
                                //   "date": "2026-06-07",              // When this data was retrieved.
                                //   "countRange": { "min": 15000, "max": 25000 },  // Optional range.
                                //   "note": "Wikidata has 2 estimates: 15,000 and 25,000"
                                // }
                                // Empty array means we have not yet found speaker count data.

  "vitality":      null,        // Endangerment / vitality assessment. When populated:
                                // {
                                //   "unescoStatus": "severely-endangered",
                                //       // Enum: "safe", "vulnerable", "definitely-endangered",
                                //       //       "severely-endangered", "critically-endangered",
                                //       //       "extinct"
                                //   "aesStatus": "shifting",
                                //       // Glottolog AES label (free text from AES data).
                                //   "egids": "6b",
                                //       // Ethnologue Expanded Graded Intergenerational Disruption
                                //       // Scale. Levels: 0 (international) to 10 (extinct).
                                //   "trend": "declining",
                                //       // Qualitative trend: "stable", "growing", "declining",
                                //       //                     "shifting", "moribund", "awakening"
                                //   "source": "glottolog-aes-5.3",
                                //   "notes": "Intergenerational transmission breaking down."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5.5. DOCUMENTATION & DIGITAL PRESENCE
  //  How well-documented is this language? What digital footprint does it
  //  have? These fields answer the practical question: "What can I
  //  actually DO with this language?"
  //  Sources: Glottolog (references), Wikipedia, Common Voice, Tatoeba.
  // ═══════════════════════════════════════════════════════════════════════

  "documentationDepth": null,    // How well-documented is this language in the literature?
                                 // {
                                 //   "referenceCount": 42,
                                 //       // Number of published references in Glottolog.
                                 //   "med": "grammar",
                                 //       // Most Extensive Description type. One of:
                                 //       // "long_grammar", "grammar", "grammar_sketch",
                                 //       // "dictionary", "phonology", "text", "wordlist",
                                 //       // "comparative", "minimal", "unknown"
                                 //   "source": "glottolog-5.3"
                                 // }

  "digitalPresence":  null,      // Digital footprint across web platforms. When populated:
                                 // {
                                 //   "wikipedia": {
                                 //     "edition": true,      // Has its own Wikipedia edition?
                                 //     "articleCount": 75000, // Number of articles.
                                 //     "editionCode": "crk",  // Wikipedia subdomain code.
                                 //     "source": "wikimedia-api-2026"
                                 //   },
                                 //   "commonVoice": {
                                 //     "validatedHours": 12.5,
                                 //     "totalHours": 25.0,
                                 //     "speakers": 45,
                                 //     "sentences": 1200,
                                 //     "source": "common-voice-20.0"
                                 //   },
                                 //   "tatoeba": {
                                 //     "sentenceCount": 342,
                                 //     "source": "tatoeba-2026"
                                 //   }
                                 // }

  "dialectCount":     null,      // Number of recognized dialects in Glottolog.
                                 // Derived from child_dialect_count in languoid.csv.
                                 // Simple integer. null if 0 or unknown.
                                 // Source: glottolog-5.3.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 6. FORMALITY, REGISTERS & GENDER
  //  How does politeness work in this language? What translation registers
  //  do we offer? How should gender be handled?
  //
  //  This section drives Champollion's register-preset system — the
  //  mechanism by which users select formal/informal/professional tone.
  //  These fields require genuine linguistic research, not automation.
  // ═══════════════════════════════════════════════════════════════════════

  "formality":     null,        // Formality system description. When populated:
                                // {
                                //   "system": "T-V",
                                //       // One of: "T-V", "speech-levels", "keigo", "particles",
                                //       //         "register-levels", "register-and-code-switching",
                                //       //         "code-switching", "none"
                                //   "description": "French uses a vous/tu distinction...",
                                //   "default": "formal-vous"   // Key into the `registers` object.
                                // }

  "registers":     null,        // Translation register presets. When populated, keyed by preset ID:
                                // {
                                //   "formal-vous": {
                                //     "label": "Formal (vouvoiement)",
                                //     "description": "One sentence: when to use this preset.",
                                //     "prompt": "The actual LLM system prompt instruction that
                                //               steers translation tone. Must name specific
                                //               linguistic features (pronouns, verb forms, particles).",
                                //     "deeplFormality": "prefer_more"
                                //       // Only if methodSupport.deepl.formality is true.
                                //       // One of: "prefer_more", "prefer_less", "default".
                                //   }
                                // }

  "gender":        null,        // Grammatical gender and inclusive guidance. When populated:
                                // {
                                //   "grammatical": true,         // Does the language have gram. gender?
                                //   "inclusiveGuidance": "Use gender-neutral forms when possible.
                                //                        Prefer 'iel' (neologism) or rephrase to
                                //                        avoid gendered agreement."
                                // }
                                // For languages without grammatical gender (Turkish, Finnish):
                                // { "grammatical": false, "inclusiveGuidance": null }

  "codeSwitching":  null,       // Code-switching behavior (for languages where mixing with another
                                // language is the norm, not an error). When populated:
                                // {
                                //   "contactLanguage": "Spanish",
                                //   "contactIso639_3": "spa",
                                //   "mixedVarietyName": "Jopará",   // null if no named mixed variety
                                //   "prevalence": "dominant",       // "rare", "common", "dominant"
                                //   "morphologicalIntegration": true,
                                //   "pipelineStrategy": "hybrid-fst",
                                //   "notes": "Jopará IS the everyday language of most Paraguayans..."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 7. LINGUISTIC PROFILE
  //  What makes this language what it is? What are the specific challenges
  //  for machine translation? What rules govern its typography?
  //  What languages have shaped it through contact?
  //
  //  These fields require genuine linguistic expertise. For many languages
  //  (especially low-resource), this section will remain null until a
  //  qualified researcher or community member contributes.
  // ═══════════════════════════════════════════════════════════════════════

  "linguisticChallenges": null,  // MT-relevant challenges, keyed by challenge ID.
                                 // When populated:
                                 // {
                                 //   "polysynthesis": "Cree is highly polysynthetic. A single verb
                                 //                    can incorporate subject, object, tense...",
                                 //   "animacy": "Verb conjugation changes based on whether the
                                 //              subject/object is animate or inanimate...",
                                 //   "neologisms": "Avoid literal translations of modern software
                                 //                 concepts. Maintain Cree metaphorical logic..."
                                 // }
                                 // Aim for 3–6 challenges per language when researched.

  "contactInfluences": [],       // How other languages have shaped this one. Array of:
                                 // {
                                 //   "source": "English",
                                 //   "sourceIso639_3": "eng",       // null if proto-language/unknown
                                 //   "type": "superstrate",
                                 //       // Enum: "superstrate", "substrate", "adstrate",
                                 //       //       "learned_borrowing", "lexical_borrowing",
                                 //       //       "relexification"
                                 //   "domains": ["education", "government", "technology"],
                                 //   "depth": "deep",
                                 //       // Enum: "light", "moderate", "heavy", "structural",
                                 //       //       "defining"
                                 //   "period": "1870–present",
                                 //   "notes": "Residential school era and ongoing...",
                                 //   "citation_needed": false
                                 //       // true if no published academic source found.
                                 //       // See language-card-citation-procedure.md.
                                 // }

  "rules":          null,        // Typography, plural, and capitalization rules. When populated:
                                 // {
                                 //   "typography": {
                                 //     "quoteStart": "\u201c",
                                 //     "quoteEnd": "\u201d",
                                 //     "usesSpaces": true,        // false for CJK, Thai, Lao, Khmer
                                 //     "punctuationSpacing": {
                                 //       "doublePunctuation": "none"  // "thin-nbsp" for French
                                 //     }
                                 //   },
                                 //   "plurals": {
                                 //     "categories": ["one", "other"]
                                 //       // From CLDR. Possible values:
                                 //       // "zero", "one", "two", "few", "many", "other"
                                 //   },
                                 //   "capitalization": {
                                 //     "hasCase": true
                                 //       // true for Latin, Cyrillic, Greek, Armenian scripts.
                                 //       // false for CJK, Arabic, Devanagari, etc.
                                 //   }
                                 // }
                                 // Source: CLDR + ISO 15924 derivation.

  "typologicalProfile": null,   // Grambank typological features. When populated:
                                // {
                                //   "featuresDocumented": 195,
                                //   "featuresCoverage": 1,     // 0.0–1.0 fraction of features
                                //   "wordOrderDominant": "SVO",
                                //   "hasDefiniteArticle": true,
                                //   "hasIndefiniteArticle": true,
                                //   "hasGenderSystem": true,
                                //   "hasCaseMorphology": true,
                                //   "hasEvidentiality": false,
                                //   "hasToneSystem": false,
                                //   "source": "grambank-1.0.3"
                                // }
                                // Auto-populated by enrich-grambank-typology.mjs.

  "phonologicalInventory": null, // PHOIBLE phoneme inventory. When populated:
                                // {
                                //   "consonants": 24,
                                //   "vowels": 16,
                                //   "tones": 0,
                                //   "totalPhonemes": 40,
                                //   "isTonal": false,
                                //   "inventorySize": "moderately-large",
                                //       // Enum: "small", "moderately-small", "average",
                                //       //       "moderately-large", "large"
                                //   "source": "phoible-2.0"
                                // }
                                // Auto-populated by enrich-phoible-phonemes.mjs.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 8. ENCYCLOPEDIC
  //  General knowledge about the language for human context. History,
  //  dialect situation, institutional resources, representative sayings.
  //  This section is for understanding, not computation.
  // ═══════════════════════════════════════════════════════════════════════

  "encyclopedic":    null,       // General knowledge. When populated:
                                 // {
                                 //   "family": "Algic",             // Redundant with classification
                                 //                                  // but useful for human readers.
                                 //   "dialects": {
                                 //     "split": true,               // Is there significant variation?
                                 //     "classification": "Plains Cree (y-dialect)",
                                 //     "variants": ["crk", "cwd", "csw"]  // ISO codes of variants
                                 //   },
                                 //   "demographics": {
                                 //     "speakers": "Approx. 20,000 active speakers",
                                 //     "regions": ["Saskatchewan", "Alberta", "Manitoba"]
                                 //   },
                                 //   "history": "Plains Cree is the most widely spoken Algonquian
                                 //              language in western Canada...",
                                 //   "resources": {
                                 //     "wikipedia": "https://en.wikipedia.org/wiki/Plains_Cree",
                                 //     "foundations": [{ "name": "ALTLab", "url": "https://..." }],
                                 //     "dictionaries": [{ "name": "itwêwina", "url": "https://..." }]
                                 //   }
                                 // }

  "culturalAphorism": null,      // A representative saying, proverb, or teaching in the language.
                                 // When populated:
                                 // {
                                 //   "text": "ê-wîcêhtonaniwahk kâ-kî-isi-wâpahtamâhk ôma pimâtisiwin",
                                 //   "transliteration": null,       // Romanized form if non-Latin script.
                                 //   "translation": "Through helping each other we come to understand
                                 //                   this life",
                                 //   "literal": "By-helping-one-another we-have-come-to-see this life",
                                 //   "source": "Cree teaching, documented in nêhiyawêwin educational
                                 //              resources"
                                 // }
                                 // Choose sayings that reveal something about the language's
                                 // worldview or structure. Must be sourced.

  "varieties":      [],          // For macrolanguages or languages with significant dialectal
                                 // variation, the individual varieties with their own tool coverage.
                                 // Each entry:
                                 // {
                                 //   "name": "Cusco Quechua",
                                 //   "iso639_3": "quz",
                                 //   "region": "Cusco, Peru",
                                 //   "fstCoverage": true,
                                 //   "corpusCoverage": true,
                                 //   "nllbCoverage": false,
                                 //   "mutualIntelligibility": "Primary variety for this card",
                                 //   "notes": "SQUOIA FST was built for this variety."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 9. DIGITAL RESOURCES & TOOLING
  //  What NLP tools, corpora, models, and datasets exist for this language?
  //  What translation APIs support it? What eval benchmarks are available?
  //
  //  This is Champollion's operational core — these fields determine what
  //  we can actually DO with this language.
  // ═══════════════════════════════════════════════════════════════════════

  "resources":      null,        // NLP resources available for this language. When populated:
                                 // {
                                 //   "fsts": [{                     // Finite-state transducers
                                 //     "name": "GiellaLT Plains Cree FST (lang-crk)",
                                 //     "url": "https://github.com/giellalt/lang-crk/releases",
                                 //     "type": "morphological-analyzer"
                                 //   }],
                                 //   "corpora": [{                  // Text corpora
                                 //     "name": "EDTeKLA Cree Language Textbook Corpus",
                                 //     "type": "parallel",          // "parallel", "monolingual"
                                 //     "pairs": ["en-crk"],
                                 //     "url": "https://...",
                                 //     "exposure": "open-web"       // "open-web", "restricted",
                                 //                                  // "holdout"
                                 //   }],
                                 //   "models": [{                   // Pre-trained models
                                 //     "name": "NLLB-200 (crk_Cans)",
                                 //     "url": "https://...",
                                 //     "type": "nmt"
                                 //   }],
                                 //   "tools": [],                   // Other NLP tools
                                 //   "wordlists": [{                // Standardized wordlists
                                 //     "name": "Lexibank",
                                 //     "conceptCount": 200,
                                 //     "source": "lexibank"
                                 //   }],
                                 //   "treebanks": [{                // Syntactic treebanks
                                 //     "name": "UD_Korean-GSD",
                                 //     "tokens": 80000,
                                 //     "source": "universal-dependencies-2.14"
                                 //   }]
                                 // }
                                 // IMPORTANT: Only actual NLP/digital resources belong here.
                                 // "This language has a WALS entry" is NOT a resource — that
                                 // goes in databaseCoverage.

  "databaseCoverage": null,      // Which typological/reference databases cover this language.
                                 // Separated from resources to avoid conflating "has a database
                                 // entry" with "has usable NLP tooling."
                                 // {
                                 //   "wals": true,
                                 //   "grambank": true,
                                 //   "phoible": true,
                                 //   "cldr": true,
                                 //   "lexibank": true,
                                 //   "commonVoice": true,
                                 //   "source": "derived"
                                 // }

  "corpusAvailability": null,    // What text/parallel corpora exist for NLP use?
                                 // {
                                 //   "bibleTranslation": {
                                 //     "textAvailable": true,
                                 //     "audioAvailable": true,
                                 //     "source": "bible-brain-api"
                                 //   },
                                 //   "opusCorpora": ["wikimedia", "ubuntu", "gnome"],
                                 //   "source": "multi-source"
                                 // }

  "keyboardSupport":  null,      // Input method / keyboard availability. When populated:
                                 // {
                                 //   "keymanKeyboards": 3,
                                 //       // Number of Keyman keyboards available.
                                 //   "cldrKeyboard": true,
                                 //       // CLDR has keyboard layout data.
                                 //   "source": "keyman-api + cldr"
                                 // }

  "methodSupport":  {            // REQUIRED. Which Champollion translation methods support this
                                 // language. Each method is an object with at minimum
                                 // { "supported": boolean }.
    "googleTranslate":     { "supported": false },
    "deepl":               { "supported": false },
    "microsoftTranslator": { "supported": false },
    "libreTranslate":      { "supported": false },
    "nllb":                { "supported": false },
                                 // When NLLB is supported, include the code:
                                 // { "supported": true, "code": "crk_Cans" }
    "llm":                 { "supported": true }
                                 // LLM is always true (quality varies by language).
                                 // Optional: "verifiedDate": "2026-06-07" for audit trail.
  },

  "metricModelSupport": null,   // Which MT evaluation models produce reliable scores.
                                // When populated:
                                // {
                                //   "xlmr": "high",          // "high", "medium", or "low"
                                //                            // XLM-R training representation tier.
                                //   "africomet": false        // true if AfriCOMET covers this language.
                                // }
                                // Drives automatic COMET model selection in metrics_comet.py.
                                // Auto-populated by enrich-metric-model-support.mjs.

  "metricPlugins":   null,      // Which per-language metric plugin packs are available.
                                // When populated:
                                // {
                                //   "formalityMarkers": true  // Formality marker resource file exists
                                //                             // at plugins/resources/formality/{code}.json
                                // }
                                // Each key corresponds to a resource pack in
                                // arena/mt_eval_harness/plugins/resources/{packName}/.
                                // To add a new metric pack for a language, create the resource
                                // file and set the flag here. No code changes required.

  "evalPack":       null,        // Evaluation dependency pack for language-specific metrics.
                                 // When populated, declares the Python dependencies and
                                 // post-install steps required by this language's eval standards.
                                 // The harness uses this for dependency gating: if deps are
                                 // missing, the harness warns the user and skips LYSS metrics
                                 // (rather than crashing).
                                 // When populated:
                                 // {
                                 //   "pythonDeps": {
                                 //     "pyhfst": "pyhfst>=1.4",    // PyPI package specs
                                 //     "requests": "requests>=2.28",
                                 //     "spacy": "spacy>=3.7"
                                 //   },
                                 //   "postInstall": [               // Commands to run after pip
                                 //     {
                                 //       "command": "spacy download en_core_web_md",
                                 //       "label": "spaCy English model (for LYSS-sem)"
                                 //     }
                                 //   ],
                                 //   "requiresFst": true,           // true if GiellaLT FST needed
                                 //   "description": "LYSS equivalence linter + FST validation"
                                 // }

  "evalMetrics":    null,        // Language-specific evaluation metrics (LYSS standards).
                                 // When populated, the harness dynamically imports these
                                 // MetricPlugin classes from eval_standards/<lang>/ and applies
                                 // them to every run targeting this language — regardless of
                                 // which method (contestant) is being evaluated.
                                 // Keyed by metric ID:
                                 // {
                                 //   "lyss-eq": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkLinterMetric",
                                 //     "description": "LYSS deterministic variant-class linter"
                                 //   },
                                 //   "lyss-sem": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkSemanticMetric",
                                 //     "description": "LYSS FST-based semantic validator",
                                 //     "dependencies": ["spacy>=3.7"],
                                 //     "spacy_models": ["en_core_web_md"]
                                 //   }
                                 // }
                                 // Architecture: eval standards are referees, not contestants.
                                 // They live in the harness (eval_standards/), not in method
                                 // plugins. This ensures all methods are scored equally.
                                 // Discovery: plugin_discovery.py reads this field via
                                 // language_cards.get_eval_metrics() and instantiates metrics
                                 // using importlib. Dependencies are checked against evalPack.

  "omt1600":        null,        // Meta's OMT-1600 (One Model for Translation) coverage assessment.
                                 // When populated:
                                 // {
                                 //   "covered": true,
                                 //   "tier": "R1",                  // Meta's resource tier
                                 //   "evalMetrics": ["chrF++", "BLASER-3"],
                                 //   "notes": "Plains Cree: no web-crawled bitext..."
                                 // }

  "evalDatasets":   [],          // Evaluation dataset IDs available for this language.
                                 // Example: ["flores-plus-devtest", "edtekla-dev-v1"].
                                 // Empty means no standardized eval set exists.

  "pipelineReadiness": null,     // Assessment of readiness for Champollion's translation pipeline.
                                 // When populated:
                                 // {
                                 //   "tier": "tier-2-feasible",
                                 //       // "watch-list"       — cataloged but no path to translation
                                 //       // "tier-3-cataloged" — basic metadata present
                                 //       // "tier-2-feasible"  — tools exist, pipeline possible
                                 //       // "tier-1-ready"     — pipeline operational
                                 //   "hasFST": true,
                                 //   "hasParallelCorpus": true,
                                 //   "hasEvalBenchmark": true,
                                 //   "blockers": ["Syllabics post-processing validation"],
                                 //   "notes": "FST-gated pipeline operational. EDTeKLA corpus..."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 10. PROVENANCE & METADATA
  //  Where does this data come from? Who reviewed it? When was it
  //  generated? What's its overall quality level?
  //
  //  This section exists to make the card auditable. Every automated
  //  enrichment, every human review, every source consulted should
  //  leave a trace here.
  // ═══════════════════════════════════════════════════════════════════════

  "dataSources":   [],           // REQUIRED. Sources consulted for this card's data.
                                 // Can be a flat array (backwards-compatible):
                                 //   ["iso639-3-2024", "glottolog-5.3", "wikidata"]
                                 //
                                 // Or a structured per-field object (preferred for new cards):
                                 //   {
                                 //     "classification": ["glottolog-5.3"],
                                 //     "vitality": ["glottolog-aes-5.3", "unesco-atlas-2024"],
                                 //     "speakerEstimates": ["wikidata", "census-ca-2021"],
                                 //     "rules": ["cldr-48"],
                                 //     "methodSupport": ["google-translate-2026-06"]
                                 //   }

  "supportTier":   "cataloged",  // Auto-derived tier summarizing the card's depth:
                                 //   "cataloged"   — identity + classification only
                                 //   "emerging"    — + vitality + speakerEstimates
                                 //   "developing"  — + resources + methodSupport
                                 //   "supported"   — full research: registers, challenges, etc.

  "humanReviewed": null,         // null until a qualified human reviews the card. When populated:
                                 // {
                                 //   "reviewer": "Prof. Kenneth Jamandre",
                                 //   "affiliation": "University of the Philippines Diliman",
                                 //   "date": "2026-06-08",
                                 //   "scope": "full",             // "full", "partial", "vitality-only"
                                 //   "notes": "Verified speaker count, vitality assessment,
                                 //             and contact influences for Tagalog."
                                 // }

  "notes":         null,         // Free-text notes about this language or this card's data quality.
                                 // Example: "Low-resource language under active development.
                                 //           Translation pipeline uses FST-gated approach."

  "firstDocumented": null,       // Year of first known documentation. Negative for BCE.
                                 // Example: -1500 (Sanskrit, ~1500 BCE), 1787 (some languages).
                                 // Source: Glottolog CLDF.

  "lastDocumented":  null,       // Year of last known documentation (relevant for extinct languages).
                                 // Source: Glottolog CLDF.

  "_generated":    null          // Auto-populated by enrichment scripts. When populated:
                                 // {
                                 //   "by": "generate-all-cards.mjs",
                                 //   "at": "2026-06-07T12:34:56Z",
                                 //   "sources": ["iso639-3", "glottolog-5.3", "wikidata"],
                                 //   "completeness": "partial",
                                 //       // "partial"     — has identity + classification + coords
                                 //       // "substantial" — + vitality + speakerEstimates + script
                                 //       // "complete"    — all automatable fields populated
                                 //   "lastEnriched": "2026-06-07"
                                 // }
}

Field Reference

§ 1. Identity Fields

Field	Type	Required	Automatable	Source
`code`	`string`	✅	✅	ISO 639-3 registry
`name`	`string`	✅	✅	ISO 639-3 registry
`nativeName`	`string \| null`	—	✅	Wikidata P1705
`alternateNames`	`string[]`	—	✅	Glottolog, Ethnologue
`iso639_3`	`string`	✅	✅	ISO 639-3 registry
`iso639_1`	`string \| null`	—	✅	ISO 639-1
`bcp47`	`string \| null`	—	Partial	IANA subtag registry
`aliases`	`string[]`	—	❌	Manual curation
`isoScope`	`string`	✅	✅	ISO 639-3 registry
`isoType`	`string`	✅	✅	ISO 639-3 registry
`macrolanguage`	`string \| null`	—	✅	ISO 639-3 macrolanguages.tab
`extends`	`string \| null`	—	❌	Manual curation

§ 2. Classification Fields

Field	Type	Required	Automatable	Source
`glottocode`	`string \| null`	—	✅	Glottolog
`classification`	`object \| null`	—	✅	Glottolog
`isIsolate`	`boolean`	—	✅	Glottolog CLDF

§ 3. Geography Fields

Field	Type	Required	Automatable	Source
`macroarea`	`string \| null`	—	✅	Glottolog CLDF
`coordinates`	`object \| null`	—	✅	Glottolog
`countries`	`string[]`	—	✅	Glottolog
`regions`	`object[]`	—	❌	Census, Ethnologue, manual
`arealContext`	`object \| null`	—	✅	Coordinates + linguistic area zones

§ 4. Writing System Fields

Field	Type	Required	Automatable	Source
`script`	`string \| null`	—	✅	Wikidata P282
`scriptUnicodeName`	`string \| null`	—	✅	Derived from `script` via ISO 15924 → Unicode mapping
`scripts`	`object[]`	—	Partial	Wikidata, manual
`dir`	`string \| null`	—	✅	Derivable from script
`scriptConverter`	`string \| null`	—	❌	Manual
`orthographicStatus`	`object \| null`	—	Partial	Ethnologue, manual

§ 5. Demographic & Vitality Fields

Field	Type	Required	Automatable	Source
`speakerEstimates`	`object[]`	—	✅	Wikidata, Ethnologue, census
`vitality`	`object \| null`	—	✅	Glottolog AES, UNESCO

§ 5.5 Documentation & Digital Presence Fields

Field	Type	Required	Automatable	Source
`documentationDepth`	`object \| null`	—	✅	Glottolog references
`digitalPresence`	`object \| null`	—	✅	Wikipedia, Common Voice, Tatoeba
`dialectCount`	`number \| null`	—	✅	Glottolog

§ 6. Formality, Register & Gender Fields

Field	Type	Required	Automatable	Source
`formality`	`object \| null`	—	❌	Linguistic research
`registers`	`object \| null`	—	❌	Linguistic research
`gender`	`object \| null`	—	❌	Linguistic research
`codeSwitching`	`object \| null`	—	❌	Linguistic research

§ 7. Linguistic Profile Fields

Field	Type	Required	Automatable	Source
`linguisticChallenges`	`object \| null`	—	❌	Linguistic research
`contactInfluences`	`object[]`	—	❌	Published linguistics
`rules`	`object \| null`	—	✅	CLDR
`typologicalProfile`	`object \| null`	—	✅	Grambank 1.0.3 — auto-populated by `enrich-grambank-typology.mjs`
`phonologicalInventory`	`object \| null`	—	✅	PHOIBLE 2.0 — auto-populated by `enrich-phoible-phonemes.mjs`

§ 8. Encyclopedic Fields

Field	Type	Required	Automatable	Source
`encyclopedic`	`object \| null`	—	❌	Manual research
`culturalAphorism`	`object \| null`	—	❌	Community contribution
`varieties`	`object[]`	—	❌	Manual research

§ 9. Digital Resource Fields

Field	Type	Required	Automatable	Source
`resources`	`object \| null`	—	Partial	Manual + automated
`databaseCoverage`	`object \| null`	—	✅	Derived from enrichment
`corpusAvailability`	`object \| null`	—	✅	Bible Brain, OPUS, Lexibank
`keyboardSupport`	`object \| null`	—	✅	Keyman API, CLDR
`methodSupport`	`object`	✅	Partial	API verification
`metricModelSupport`	`object \| null`	—	✅	XLM-R paper, AfriCOMET paper
`metricPlugins`	`object \| null`	—	✅	Card enrichment — nagde-declare kung aling metric plugin packs ang naaangkop (hal., `{ formalityMarkers: true }`)
`omt1600`	`object \| null`	—	✅	Meta assessment
`evalDatasets`	`string[]`	—	✅	Dataset registry
`pipelineReadiness`	`object \| null`	—	Partial	Derived + manual

resources.fsts[].install: Maaaring magsama ang FST entries sa resources object ng install sub-object na may fields: repo, releaseTag, assetPattern, format, maturity, at opsyonal na bundlePattern. Pinapalitan nito ang dating GIELLALT_FST_REGISTRY hardcoded dict. Tingnan ang get_fst_install_info() sa language_cards.py.

§ 10. Provenance Fields

Field	Type	Required	Automatable	Source
`dataSources`	`array \| object`	✅	✅	Auto + manual
`supportTier`	`string`	—	✅	Derived from card completeness
`humanReviewed`	`object \| null`	—	❌	Human reviewer
`notes`	`string \| null`	—	❌	Manual
`firstDocumented`	`number \| null`	—	✅	Glottolog CLDF
`lastDocumented`	`number \| null`	—	✅	Glottolog CLDF
`_generated`	`object \| null`	—	✅	Enrichment scripts

Patakaran sa Language Code

Gumagamit ang Champollion ng ISO 639-3 bilang canonical identifier. Ang iba pang standard codes ay nakarehistro bilang aliases at nire-resolve sa ISO 639-3 code sa runtime.

Priority	Standard	Halimbawa	Field	Paggamit
1 (canonical)	ISO 639-3	`crk`	`code`	Card filename, config keys, API params
2 (alias)	ISO 639-1	`iu`	`aliases[]`	Tinatanggap sa CLI, nire-resolve sa ISO 639-3
3 (alias)	BCP 47	`fil`	`aliases[]`	Tinatanggap sa CLI, nire-resolve sa ISO 639-3
Reference	Glottocode	`plai1258`	`glottocode`	Classification lamang, hindi para sa runtime

Resolution order: Kapag nagbibigay ang user ng code:

Direct match sa card.code → found
Match sa card.aliases[] → found, ibalik ang canonical card
Match sa card.iso639_1 → found (fallback)
Not found → error

Kasaysayan ng Migration: ISO 639-1 → ISO 639-3

Bago ang v8, gumamit ang card filenames ng ISO 639-1 codes kung available (fr.json, de.json, ja.json). Sa 639-3 migration, lahat ng card ay pinalitan ng pangalan tungo sa kanilang ISO 639-3 equivalents:

Bago	Pagkatapos	Bakit
`fr.json`	`fra.json`	639-3 ang canonical
`de.json`	`deu.json`	639-3 ang canonical
`zh.json`	`cmn.json`	Macrolanguage → default individual
`ar.json`	`arb.json`	Macrolanguage → Modern Standard Arabic
`ms.json`	`zsm.json`	Macrolanguage → Standard Malay

Ano ang nangyari sa mga lumang code?

Ang lumang 639-1 code ay nasa card.iso639_1
Ang lumang 639-1 code ay nasa card.aliases[]
Nagbabalik ang resolveCode("fr") ng "fra" sa runtime — backwards compatible
Maaari pa ring isulat ng mga user ang "fr" sa kanilang config — transparent itong nire-resolve

Ano ang nagbago sa architecture:

Nilalaktawan na ngayon ng _deepMerge() ang null values (nag-i-inherit mula sa parent)
May identity field set na ngayon ang _deepMerge() (code, extends, aliases hindi kailanman ini-inherit)
Derived na ngayon ang formality.default mula sa register isDefault: true flags
205 Grambank-derived cards ang nakakuha ng structural formality.default fix
38 genus/family/macrolanguage cards ang nagbibigay ng inheritance targets

Edge Cases

Sign Languages

Ang sign languages (hal., ASE — American Sign Language) ay mga lehitimong wika na may ISO 639-3 codes. Mayroon silang geography at speaker counts ngunit:

Ang script ay karaniwang null (walang standard written form)
Maaaring kasama sa scripts ang "Sgnw" (SignWriting) kung ginagamit ang isang notation system
Ang dir ay null
Dapat talakayin ng linguisticChallenges ang spatial grammar, classifiers, atbp.
Ang gender.grammatical ay karaniwang false

Sinauna at Makasaysayang Wika

Ang mga wikang gaya ng Latin (lat, isoType H) at Sanskrit (san, isoType H) ay ginagamit pa rin sa partikular na contexts (liturgical, academic) ngunit walang native speakers:

Maaaring itala ng vitality ang "walang native speakers" na may "trend": "stable" (hindi declining — stable ang community na gumagamit nito, maliit lamang)
Dapat itala ng speakerEstimates na ang mga ito ay L2 speakers, hindi L1
Inilalagay sila ng firstDocumented / lastDocumented sa panahon

Constructed Languages

Esperanto (epo, isoType C), Lojban, atbp.:

Maaaring tumukoy ang classification sa isang "constructed" family o null
Ipinapakita ng contactInfluences ang source material (hal., humuhugot ang Esperanto sa Romance, Germanic, Slavic)
Hindi pangkaraniwan ang vitality — lumalaking speaker community ngunit walang native homeland

Macrolanguages

Ang Arabic (ara), Chinese (zho), Cree (cre), Quechua (que) ay macrolanguages na sumasaklaw sa maraming indibidwal na wika:

isoScope: "M"
Dapat ilista ng varieties ang individual languages kasama ang kanilang ISO codes
Dapat ipakita ng methodSupport kung ano ang sinusuportahan ng macrolanguage card (karaniwan ang standardized variety)
Dapat mayroon ding sariling cards ang individual varieties

Mga Wikang Walang Standardized Orthography

Maraming wika (lalo na ang mga wikang may oral tradition) ang walang standardized writing system, o may magkakatunggaling orthographies:

Ang script ay null
Ang scripts ay []
Ang dir ay null
Dapat ipaliwanag ng notes ang orthographic situation
Dapat itala ng linguisticChallenges kung paano ito nakaaapekto sa MT (hal., walang training data)

Diglossia

Mga wikang gaya ng Arabic (MSA vs. dialects) o Guaraní (Jopará vs. pure Guaraní):

Kinukuha ng codeSwitching ang mixed-variety situation
Maaaring mag-alok ang registers ng presets para sa iba't ibang level
Maaaring ilista ng varieties ang diglossic pair

Contact Influence Types

Type	Kahulugan	Halimbawa
`superstrate`	Dominant language na ipinataw sa isang community	French → English (post-1066)
`substrate`	Native language na nakaiimpluwensiya sa isang imposed language	Celtic → English
`adstrate`	Neighboring language na may mutual influence	Norse → English
`learned_borrowing`	Borrowings sa pamamagitan ng education/scholarship	Latin → English
`lexical_borrowing`	Direct vocabulary loans sa pamamagitan ng contact	Spanish → Filipino
`relexification`	Wholesale vocabulary replacement	Portuguese → Papiamentu

Contact Influence Depths

Depth	Kahulugan
`light`	Ilang loanword, minimal na structural impact
`moderate`	Significant vocabulary sa partikular na domains
`heavy`	Pervasive vocabulary at ilang structural features
`structural`	Apektado ang grammar, syntax, at phonology
`defining`	Core identity na nahubog ng contact (creoles, mixed languages)

Pagsulat ng Mahuhusay na Register Presets

Mahuhusay na preset prompts:

Pangalanan nang eksplisito ang formality feature (hal., "해요체", "vous-form", "siz-form")
Ipaliwanag ang espesipikong pronoun o verb form na gagamitin
Magbigay ng context kung kailan naaangkop ang register na ito
Banggitin ang script considerations kung naaangkop

Huwag ilagay ang gender-inclusive guidance sa preset prompt. Ang gender guidance ay kabilang sa card.gender.inclusiveGuidance — ini-inject ito nang hiwalay.

❌ Bad:  "Standard Thai. Professional register."
✔ Good: "Professional Thai. Use คุณ (khun) for second person, เรา (rao)
         for first person when needed. Clear, concise phrasing
         appropriate for digital interfaces."

Preset Naming Convention

Dapat maging descriptive at lowercase-hyphenated ang preset keys:

T-V languages: formal-vous, informal-tu, formal-Sie, casual-du
Speech levels: polite-haeyo, formal-hapsyo, casual-hae
Neutral: professional, neutral-professional
Code-switching: taglish-professional, pure-filipino

Enrichment Procedure

Per-Card Processing Order

Kapag nag-e-enrich ng card, kumonsulta sa sources sa ganitong pagkakasunod-sunod. Idokumento ang bawat source na kinonsulta, kahit wala itong ibinalik na data.

ISO 639-3 registry → code, name, isoScope, isoType
ISO 639-3 macrolanguages.tab → macrolanguage
Glottolog languoid.csv → glottocode, classification, coordinates, countries
Glottolog CLDF → macroarea, isIsolate, firstDocumented, lastDocumented
Glottolog AES → vitality (endangerment status)
Wikidata SPARQL → nativeName, speakerEstimates, script, scripts, dir
CLDR → rules (typography, plurals, capitalization)
NLLB-200 / FLORES+ → methodSupport.nllb, evalDatasets
API verification → natitirang methodSupport entries
ML model papers → metricModelSupport (XLM-R training data, AfriCOMET coverage) Script: node scripts/enrich-metric-model-support.mjs

Conflict Handling

Kapag hindi nagkakasundo ang sources:

I-store ang pareho na may source attribution
HUWAG mag-average o pumili ng panig
Itala ang discrepancy sa naaangkop na note field
Mas piliin ang pinakabagong primary source lamang kapag kailangan ang iisang value para sa computation

Validation

Patakbuhin ang linter pagkatapos ng anumang enrichment o manual edit:

node scripts/lint-language-cards.mjs              # all cards
node scripts/lint-language-cards.mjs --lang crk    # single card

PR Checklist

Kapag nagsusumite ng bago o binagong language card:

File na pinangalanang <code>.json sa shared/language-cards/
Nandoon ang lahat ng top-level fields mula sa canonical template
classification populated mula sa Glottolog (hindi hand-built)
dataSources naglilista ng lahat ng sources na kinonsulta
methodSupport entries na-verify laban sa aktuwal na API language lists
contactInfluences entries may published sources o citation_needed: true
linguisticChallenges na may 3–6 MT-relevant challenges (kung na-research)
rules populated mula sa CLDR (kung may locale data)
Pumapasa ang linter nang walang errors

Professional References

Standard	Pinananatili Ng	Aming Paggamit
ISO 639-3	SIL International	Canonical language codes, macrolanguage relationships
Glottolog	Max Planck Institute	Classification, coordinates, AES endangerment
WALS	Max Planck Institute	Genus definitions, typological features
ISO 15924	Unicode/ISO	Script codes
CLDR	Unicode Consortium	Locale data, plural rules, typography
Wikidata	Wikimedia Foundation	Speaker counts, endonyms, script data
Ethnologue	SIL International	EGIDS, speaker estimates, DLS
UNESCO Atlas	UNESCO	Endangerment classification
Katig Collective	UP Diliman	Philippine language capsules

Tingnan din: Pamamaraan sa Citation ng Language Card para sa detalyadong source-by-source guidance.

Mga Prinsipyo ng Disenyo​

Three-Layer Architecture​

Inheritance Model​

Merge Semantics​

Identity Fields (Hindi Kailanman Ini-inherit)​

Halimbawa: Paano Nire-resolve ang isang Cree Card​

Genus Card Template​

Canonical Template​

Field Reference​

§ 1. Identity Fields​

§ 2. Classification Fields​

§ 3. Geography Fields​

§ 4. Writing System Fields​

§ 5. Demographic & Vitality Fields​

§ 5.5 Documentation & Digital Presence Fields​

§ 6. Formality, Register & Gender Fields​

§ 7. Linguistic Profile Fields​

§ 8. Encyclopedic Fields​

§ 9. Digital Resource Fields​

§ 10. Provenance Fields​

Patakaran sa Language Code​

Kasaysayan ng Migration: ISO 639-1 → ISO 639-3​

Edge Cases​

Sign Languages​

Sinauna at Makasaysayang Wika​

Constructed Languages​

Macrolanguages​

Mga Wikang Walang Standardized Orthography​

Diglossia​

Contact Influence Types​

Contact Influence Depths​

Pagsulat ng Mahuhusay na Register Presets​

Preset Naming Convention​

Enrichment Procedure​

Per-Card Processing Order​

Conflict Handling​

Validation​

PR Checklist​

Professional References​