Especificación de Tarjeta de Idioma

Fuente única de verdad. Este documento define la forma canónica de cada tarjeta de idioma. Cada tarjeta DEBE contener todos los campos de nivel superior enumerados aquí, incluso cuando el valor es null o []. Una tarjeta con un campo faltante no es conforme. Esta uniformidad es lo que permite que herramientas automatizadas, linters, scripts de enriquecimiento y revisores humanos confíen en la estructura de la tarjeta.

Principios de Diseño

Forma uniforme. Las 8,000+ tarjetas tienen los mismos campos de nivel superior. Los valores desconocidos son null, los arreglos vacíos son [], los objetos vacíos son null (no {}). Esto significa que el código nunca necesita verificar "¿existe este campo?" — solo "¿está poblado?"
Obtener todo de fuentes. Cada afirmación factual se remonta a una fuente primaria nombrada y versionada. Las afirmaciones sin fuente son afirmaciones inverificables. El campo dataSources (y las anotaciones source por campo en subobjetos) hacen la procedencia explícita.
Preservar desacuerdo. Cuando las autoridades no están de acuerdo (Wikidata dice 50,000 hablantes, Ethnologue dice 20,000), almacenamos ambos con atribución de fuente. No promediamos, resolvemos ni elegimos bandos. Los usuarios pueden navegar el matiz.
Nulo significa desconocido, no inaplicable. Si un campo es null, significa "aún no hemos encontrado datos para esto." Si un campo genuinamente no aplica (p. ej., grammatical gender para una lengua de signos), el valor debe explicar eso: { "grammatical": false, "inclusiveGuidance": "No aplica — ASL no tiene género gramatical." }
Solo fusionar. Los scripts de enriquecimiento agregan datos, nunca sobrescriben. Los valores curados manualmente tienen prioridad sobre datos automatizados.

Arquitectura de Tres Capas

Capa	Ubicación	Propósito
Tarjetas de idioma	`shared/language-cards/<code>.json`	Configuración por idioma: identidad, clasificación, recursos, todo
Tarjetas de género	`shared/language-cards/genera/<genus>.json`	Propiedades de tiempo de ejecución compartidas para idiomas relacionados (curadas, no generadas automáticamente)
Árbol de idiomas	`shared/language-cards/language-tree.json`	Jerarquía completa de Glottolog — datos de referencia para Lab UI y descubrimiento de idiomas

Modelo de Herencia

Cuando una tarjeta establece "extends": "family-dravidian", el tiempo de ejecución fusiona la tarjeta padre en la tarjeta hijo usando _deepMerge() (en lib/registers.js). Esto permite que las tarjetas de género definan registros compartidos, sistemas de formalidad y orientación de género que fluyen hacia todos los idiomas miembros — sin duplicar datos en cientos de tarjetas individuales.

Semántica de Fusión

Valor del hijo	Comportamiento	Por qué
`null`	Heredar del padre	`null` significa "no defino esto" — el valor del padre fluye
No nulo	Anular padre	Los datos del hijo son más específicos — tienen prioridad
Objeto anidado	Fusión recursiva	Los campos del hijo anulan, los campos del padre se preservan
Arreglo	Reemplazar completamente	Los arreglos no se fusionan elemento por elemento — el arreglo del hijo gana

Campos de Identidad (Nunca Heredados)

Algunos campos pertenecen a la tarjeta misma y NUNCA deben heredarse de un padre:

code, extends, _migration, aliases, iso639_1, iso639_3

Incluso si una tarjeta padre define aliases: ["macro-code"], una tarjeta hijo NO heredará esos alias. Estos campos son siempre los valores propios del hijo (incluyendo null si no está establecido).

Por qué: Sin esta regla, cada idioma Cree heredaría aliases: ["cre"] del padre de macroidioma, haciendo que cada variedad sea un alias del macro.

Ejemplo: Cómo se Resuelve una Tarjeta Cree

┌───────────────────────┐
│  family-algic.json    │  formality: null, registers: null
│  (no registers)       │
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  genus-cree.json      │  formality: { system: "obviative-animate", ... }
│  (sourced registers)  │  registers: { formal: {...}, informal: {...} }
└──────────┬────────────┘
           │ extends
┌──────────┴────────────┐
│  crk.json             │  code: "crk", extends: "genus-cree"
│  (Plains Cree)        │  formality: null → inherits from genus-cree
│                       │  registers: null → inherits from genus-cree
│                       │  script: "Cans"  → own value, no inheritance
│                       │  code: "crk"     → identity field, never inherited
└───────────────────────┘

En tiempo de ejecución, getLanguageCard("crk") devuelve un objeto fusionado con registros de genus-cree + propiedades de family-algic (si las hay) + identidad y metadatos propios de crk.

Plantilla de Tarjeta de Género

Las tarjetas de género viven en shared/language-cards/genera/ y definen propiedades compartidas para un grupo de idiomas. Siguen el mismo esquema que las tarjetas regulares pero con convenciones diferentes:

{
  // Identity — genus cards use a prefixed code, NOT an ISO 639-3 code
  "code": "genus-cree",           // "genus-", "family-", or "macrolanguage-" prefix
  "name": "Cree Languages",      // Human-readable group name
  "extends": "family-algic",     // Genus cards can extend family cards (chaining)

  // Formality — shared across the group, sourced from typological databases
  "formality": {
    "system": "obviative-animate",
    "description": "Cree languages use an obviative/proximate system...",
    "default": "formal",
    "source": "WALS 37A, 38A + Wolfart 1973"
  },

  // Registers — shared presets, if the group shares a formality system
  "registers": {
    "formal": {
      "label": "Formal (Proximate)",
      "description": "...",
      "prompt": "...",
      "isDefault": true
    },
    "informal": {
      "label": "Informal",
      "description": "...",
      "prompt": "..."
    }
  },

  // Gender — shared grammatical gender behavior
  "gender": {
    "grammatical": false,       // Cree doesn't have grammatical gender
    "inclusiveGuidance": null   //   so no inclusive guidance needed
  },

  // Everything else is null — individual cards provide their own
  // classification, geography, resources, etc.
  "classification": null,
  "methodSupport": null,
  // ...
}

Regla clave: Las tarjetas de género SOLO deben contener datos que sean genuinamente compartidos en todo el grupo y obtenidos de referencias autorizadas. Si un sistema de formalidad varía entre miembros, pertenece a las tarjetas individuales, no al género.

Plantilla Canónica

Cada tarjeta DEBE tener esta forma exacta de nivel superior. Los esquemas de subobjetos están documentados en la Referencia de Campos a continuación.

{
  // ═══════════════════════════════════════════════════════════════════════
  //  § 1. IDENTITY
  //  Who is this language? What codes identify it?
  //  Sources: ISO 639-3 registry, ISO 639-1, BCP 47/IANA.
  // ═══════════════════════════════════════════════════════════════════════

  "code":          "xxx",       // REQUIRED. ISO 639-3 code. This IS the card ID and filename.
  "name":          "English Name",  // REQUIRED. English reference name from ISO 639-3 registry.
  "nativeName":    null,        // Endonym (name in the language itself). Source: Wikidata P1705.
                                // Examples: "nêhiyawêwin / ᓀᐦᐃᔭᐍᐏᐣ", "日本語", "Esperanto".
  "alternateNames": [],         // Other names this language is known by. Source: Glottolog, Ethnologue.
                                // Not aliases (those are code-level). These are name-level variants.
                                // Example: ["Qafar af", "Afaraf", "'Afar Af"] for Afar (aar).
  "iso639_3":      "xxx",      // REQUIRED. Three-letter ISO 639-3 code. Same as `code`.
  "iso639_1":      null,        // Two-letter ISO 639-1 code (e.g., "en", "fr"). null if none.
  "bcp47":         null,        // IETF BCP 47 tag. Often same as iso639_1. Can include subtags
                                // (e.g., "iu-Cans-CA"). null if unknown.
  "aliases":       [],          // Alternative code-level identifiers that resolve to this card.
                                // Example: ["fil"] for tl (Tagalog), ["iu"] for iku (Inuktitut).
                                // Used by code resolution: user types "fil", system loads tl.json.
  "isoScope":      "I",        // REQUIRED. ISO 639-3 scope:
                                //   "I" = Individual language
                                //   "M" = Macrolanguage (e.g., Chinese, Arabic, Cree)
                                //   "S" = Special (e.g., mis, mul, zxx)
  "isoType":       "L",        // REQUIRED. ISO 639-3 type:
                                //   "L" = Living    "E" = Extinct    "A" = Ancient
                                //   "H" = Historical    "C" = Constructed
  "macrolanguage": null,        // If this language is part of a macrolanguage, the macrolanguage
                                // ISO 639-3 code (e.g., "cre" for Plains Cree, "ara" for Arabic
                                // varieties). Source: ISO 639-3 macrolanguages.tab.
  "extends":       null,        // Genus card key if shared properties are inherited from a genus
                                // card (e.g., "genus-cree", "genus-eskimo-aleut").
                                // null for most languages.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 2. CLASSIFICATION
  //  Where does this language sit in the family tree?
  //  Source: Glottolog. NEVER hand-build classifications.
  // ═══════════════════════════════════════════════════════════════════════

  "glottocode":      null,      // Glottolog identifier (e.g., "plai1258", "stan1293").
                                // null if the language is not in Glottolog.
  "classification":  null,      // Genealogical classification from Glottolog. When populated:
                                // {
                                //   "family": "Algic",              // Top-level family. null for isolates.
                                //   "familyGlottocode": "algi1248", // Glottocode of the family.
                                //   "genus": "Plains Creeic",       // WALS-style genus.
                                //   "genusGlottocode": "plai1264",  // Glottocode of the genus.
                                //   "ancestry": ["Algic", "Algonquian-Blackfoot", "Algonquian",
                                //                "Cree-Montagnais-Naskapi", "Cree", "Plains Creeic"]
                                // }
                                // For isolates: family = language name, genus = language name,
                                // ancestry = [language name].
  "isIsolate":       false,     // true if a language isolate (no known genetic relatives).
                                // Source: Glottolog CLDF.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 3. GEOGRAPHY
  //  Where is this language spoken?
  //  Sources: Glottolog (coordinates, countries), census data, Ethnologue.
  // ═══════════════════════════════════════════════════════════════════════

  "macroarea":     null,        // Glottolog macroarea. One of: "Africa", "Australia",
                                // "Eurasia", "North America", "Papunesia", "South America".
                                // null if unknown. Source: Glottolog CLDF.
  "coordinates":   null,        // Representative geographic point. When populated:
                                // { "lat": 52.1, "lng": -106.6, "source": "glottolog-5.3" }
                                // This is a representative point, not a boundary.
  "countries":     [],          // ISO 3166-1 alpha-2 country codes where this language is spoken.
                                // Example: ["CA", "US"]. Source: Glottolog.
  "regions":       [],          // Detailed regional breakdown with admin codes & speaker estimates.
                                // Each entry:
                                // {
                                //   "country": "Canada",
                                //   "countryCode": "CA",
                                //   "officialStatus": "recognized",  // official, co-official,
                                //                                    // recognized, none
                                //   "region": "Saskatchewan, Alberta, Manitoba",
                                //   "speakerEstimate": "~20,000",
                                //   "coordinates": [-106.6, 52.1],   // [lng, lat]
                                //   "admin1Codes": ["CA-SK", "CA-AB", "CA-MB"]
                                // }

  "arealContext":  null,         // Linguistic area / Sprachbund membership. DISTINCT from
                                // contactInfluences (which is language-specific contact history).
                                // This field captures zone-level typological convergence patterns
                                // — i.e., what linguistic area the language exists within and
                                // what features are common across that area.
                                // {
                                //   "zone": "Mainland Southeast Asian Sprachbund",
                                //   "arealFeatures": "Tonal convergence, classifier systems,
                                //     topic-prominence, monosyllabicity trend.",
                                //   "typicalContacts": ["Classical Chinese", "Sanskrit/Pali"],
                                //   "source": "areal-linguistics (Enfield 2005)"
                                // }
                                // NOT the same as contactInfluences. A language can exist within
                                // a convergence area without having specific contact history with
                                // any particular language in that area.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 4. WRITING SYSTEMS
  //  How is this language written?
  //  Sources: Wikidata P282, ISO 15924, manual research.
  //  Note: Some languages have NO standardized orthography. Some have
  //  competing orthographies. Some use multiple scripts routinely (e.g.,
  //  Serbian: Cyrillic + Latin; Japanese: Kanji + Hiragana + Katakana).
  //  Sign languages may use notation systems (SignWriting, HamNoSys) or
  //  none at all.
  // ═══════════════════════════════════════════════════════════════════════

  "script":        null,        // Primary ISO 15924 script code (e.g., "Latn", "Cyrl", "Cans",
                                // "Jpan"). null if no written form or unknown.
  "scriptUnicodeName": null,    // Unicode script block name derived from the script field.
                                // e.g., "Latin", "Cyrillic", "Canadian_Aboriginal", "CJK".
                                // Used by code_switching metric plugin. Auto-populated by
                                // enrich-script-unicode-names.mjs. null if script is null.
  "scripts":       [],          // All writing systems with detail. Array of:
                                // {
                                //   "code": "Cans",
                                //   "name": "Unified Canadian Aboriginal Syllabics",
                                //   "primary": true
                                // }
                                // A language with multiple scripts has multiple entries.
                                // A language with no written form has [].
  "dir":           null,        // Writing direction: "ltr" (left-to-right) or "rtl" (right-to-left).
                                // null if no written form or unknown.
  "scriptConverter": null,      // Script converter key if we have a converter for this language
                                // (e.g., "crk" for SRO↔Syllabics). null for most languages.
  "orthographicStatus": null,   // Writing system standardization status. When populated:
                                // {
                                //   "status": "standardized",
                                //       // "standardized" — official/agreed orthography exists
                                //       // "competing"    — multiple orthographies in active use
                                //       // "emerging"     — orthography under development
                                //       // "none"         — primarily oral, no standard writing
                                //   "notes": "Uses SIL-developed Latin orthography since 1960s.",
                                //   "source": "ethnologue" // or "manual-curation"
                                // }
                                // Crucial for LRLs where orthographic variation directly impacts
                                // MT training data quality and evaluation consistency.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5. DEMOGRAPHICS & VITALITY
  //  How many people speak this language? Is it endangered?
  //  Sources: Census, Ethnologue, UNESCO Atlas, Wikidata, Glottolog AES.
  //
  //  CRITICAL: Store ALL estimates separately with source attribution.
  //  Never average or "resolve" conflicting data. Speaker counts are
  //  politically contested for many languages. Present the evidence,
  //  let the reader assess.
  // ═══════════════════════════════════════════════════════════════════════

  "speakerEstimates": [],       // Array of speaker count estimates from different authorities.
                                // Each entry:
                                // {
                                //   "source": "wikidata",              // or "ethnologue-28",
                                //                                      // "census-ph-2020", etc.
                                //   "count": 20000,                    // Point estimate. null if range-only.
                                //   "date": "2026-06-07",              // When this data was retrieved.
                                //   "countRange": { "min": 15000, "max": 25000 },  // Optional range.
                                //   "note": "Wikidata has 2 estimates: 15,000 and 25,000"
                                // }
                                // Empty array means we have not yet found speaker count data.

  "vitality":      null,        // Endangerment / vitality assessment. When populated:
                                // {
                                //   "unescoStatus": "severely-endangered",
                                //       // Enum: "safe", "vulnerable", "definitely-endangered",
                                //       //       "severely-endangered", "critically-endangered",
                                //       //       "extinct"
                                //   "aesStatus": "shifting",
                                //       // Glottolog AES label (free text from AES data).
                                //   "egids": "6b",
                                //       // Ethnologue Expanded Graded Intergenerational Disruption
                                //       // Scale. Levels: 0 (international) to 10 (extinct).
                                //   "trend": "declining",
                                //       // Qualitative trend: "stable", "growing", "declining",
                                //       //                     "shifting", "moribund", "awakening"
                                //   "source": "glottolog-aes-5.3",
                                //   "notes": "Intergenerational transmission breaking down."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 5.5. DOCUMENTATION & DIGITAL PRESENCE
  //  How well-documented is this language? What digital footprint does it
  //  have? These fields answer the practical question: "What can I
  //  actually DO with this language?"
  //  Sources: Glottolog (references), Wikipedia, Common Voice, Tatoeba.
  // ═══════════════════════════════════════════════════════════════════════

  "documentationDepth": null,    // How well-documented is this language in the literature?
                                 // {
                                 //   "referenceCount": 42,
                                 //       // Number of published references in Glottolog.
                                 //   "med": "grammar",
                                 //       // Most Extensive Description type. One of:
                                 //       // "long_grammar", "grammar", "grammar_sketch",
                                 //       // "dictionary", "phonology", "text", "wordlist",
                                 //       // "comparative", "minimal", "unknown"
                                 //   "source": "glottolog-5.3"
                                 // }

  "digitalPresence":  null,      // Digital footprint across web platforms. When populated:
                                 // {
                                 //   "wikipedia": {
                                 //     "edition": true,      // Has its own Wikipedia edition?
                                 //     "articleCount": 75000, // Number of articles.
                                 //     "editionCode": "crk",  // Wikipedia subdomain code.
                                 //     "source": "wikimedia-api-2026"
                                 //   },
                                 //   "commonVoice": {
                                 //     "validatedHours": 12.5,
                                 //     "totalHours": 25.0,
                                 //     "speakers": 45,
                                 //     "sentences": 1200,
                                 //     "source": "common-voice-20.0"
                                 //   },
                                 //   "tatoeba": {
                                 //     "sentenceCount": 342,
                                 //     "source": "tatoeba-2026"
                                 //   }
                                 // }

  "dialectCount":     null,      // Number of recognized dialects in Glottolog.
                                 // Derived from child_dialect_count in languoid.csv.
                                 // Simple integer. null if 0 or unknown.
                                 // Source: glottolog-5.3.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 6. FORMALITY, REGISTERS & GENDER
  //  How does politeness work in this language? What translation registers
  //  do we offer? How should gender be handled?
  //
  //  This section drives Champollion's register-preset system — the
  //  mechanism by which users select formal/informal/professional tone.
  //  These fields require genuine linguistic research, not automation.
  // ═══════════════════════════════════════════════════════════════════════

  "formality":     null,        // Formality system description. When populated:
                                // {
                                //   "system": "T-V",
                                //       // One of: "T-V", "speech-levels", "keigo", "particles",
                                //       //         "register-levels", "register-and-code-switching",
                                //       //         "code-switching", "none"
                                //   "description": "French uses a vous/tu distinction...",
                                //   "default": "formal-vous"   // Key into the `registers` object.
                                // }

  "registers":     null,        // Translation register presets. When populated, keyed by preset ID:
                                // {
                                //   "formal-vous": {
                                //     "label": "Formal (vouvoiement)",
                                //     "description": "One sentence: when to use this preset.",
                                //     "prompt": "The actual LLM system prompt instruction that
                                //               steers translation tone. Must name specific
                                //               linguistic features (pronouns, verb forms, particles).",
                                //     "deeplFormality": "prefer_more"
                                //       // Only if methodSupport.deepl.formality is true.
                                //       // One of: "prefer_more", "prefer_less", "default".
                                //   }
                                // }

  "gender":        null,        // Grammatical gender and inclusive guidance. When populated:
                                // {
                                //   "grammatical": true,         // Does the language have gram. gender?
                                //   "inclusiveGuidance": "Use gender-neutral forms when possible.
                                //                        Prefer 'iel' (neologism) or rephrase to
                                //                        avoid gendered agreement."
                                // }
                                // For languages without grammatical gender (Turkish, Finnish):
                                // { "grammatical": false, "inclusiveGuidance": null }

  "codeSwitching":  null,       // Code-switching behavior (for languages where mixing with another
                                // language is the norm, not an error). When populated:
                                // {
                                //   "contactLanguage": "Spanish",
                                //   "contactIso639_3": "spa",
                                //   "mixedVarietyName": "Jopará",   // null if no named mixed variety
                                //   "prevalence": "dominant",       // "rare", "common", "dominant"
                                //   "morphologicalIntegration": true,
                                //   "pipelineStrategy": "hybrid-fst",
                                //   "notes": "Jopará IS the everyday language of most Paraguayans..."
                                // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 7. LINGUISTIC PROFILE
  //  What makes this language what it is? What are the specific challenges
  //  for machine translation? What rules govern its typography?
  //  What languages have shaped it through contact?
  //
  //  These fields require genuine linguistic expertise. For many languages
  //  (especially low-resource), this section will remain null until a
  //  qualified researcher or community member contributes.
  // ═══════════════════════════════════════════════════════════════════════

  "linguisticChallenges": null,  // MT-relevant challenges, keyed by challenge ID.
                                 // When populated:
                                 // {
                                 //   "polysynthesis": "Cree is highly polysynthetic. A single verb
                                 //                    can incorporate subject, object, tense...",
                                 //   "animacy": "Verb conjugation changes based on whether the
                                 //              subject/object is animate or inanimate...",
                                 //   "neologisms": "Avoid literal translations of modern software
                                 //                 concepts. Maintain Cree metaphorical logic..."
                                 // }
                                 // Aim for 3–6 challenges per language when researched.

  "contactInfluences": [],       // How other languages have shaped this one. Array of:
                                 // {
                                 //   "source": "English",
                                 //   "sourceIso639_3": "eng",       // null if proto-language/unknown
                                 //   "type": "superstrate",
                                 //       // Enum: "superstrate", "substrate", "adstrate",
                                 //       //       "learned_borrowing", "lexical_borrowing",
                                 //       //       "relexification"
                                 //   "domains": ["education", "government", "technology"],
                                 //   "depth": "deep",
                                 //       // Enum: "light", "moderate", "heavy", "structural",
                                 //       //       "defining"
                                 //   "period": "1870–present",
                                 //   "notes": "Residential school era and ongoing...",
                                 //   "citation_needed": false
                                 //       // true if no published academic source found.
                                 //       // See language-card-citation-procedure.md.
                                 // }

  "rules":          null,        // Typography, plural, and capitalization rules. When populated:
                                 // {
                                 //   "typography": {
                                 //     "quoteStart": "\u201c",
                                 //     "quoteEnd": "\u201d",
                                 //     "usesSpaces": true,        // false for CJK, Thai, Lao, Khmer
                                 //     "punctuationSpacing": {
                                 //       "doublePunctuation": "none"  // "thin-nbsp" for French
                                 //     }
                                 //   },
                                 //   "plurals": {
                                 //     "categories": ["one", "other"]
                                 //       // From CLDR. Possible values:
                                 //       // "zero", "one", "two", "few", "many", "other"
                                 //   },
                                 //   "capitalization": {
                                 //     "hasCase": true
                                 //       // true for Latin, Cyrillic, Greek, Armenian scripts.
                                 //       // false for CJK, Arabic, Devanagari, etc.
                                 //   }
                                 // }
                                 // Source: CLDR + ISO 15924 derivation.

  "typologicalProfile": null,   // Grambank typological features. When populated:
                                // {
                                //   "featuresDocumented": 195,
                                //   "featuresCoverage": 1,     // 0.0–1.0 fraction of features
                                //   "wordOrderDominant": "SVO",
                                //   "hasDefiniteArticle": true,
                                //   "hasIndefiniteArticle": true,
                                //   "hasGenderSystem": true,
                                //   "hasCaseMorphology": true,
                                //   "hasEvidentiality": false,
                                //   "hasToneSystem": false,
                                //   "source": "grambank-1.0.3"
                                // }
                                // Auto-populated by enrich-grambank-typology.mjs.

  "phonologicalInventory": null, // PHOIBLE phoneme inventory. When populated:
                                // {
                                //   "consonants": 24,
                                //   "vowels": 16,
                                //   "tones": 0,
                                //   "totalPhonemes": 40,
                                //   "isTonal": false,
                                //   "inventorySize": "moderately-large",
                                //       // Enum: "small", "moderately-small", "average",
                                //       //       "moderately-large", "large"
                                //   "source": "phoible-2.0"
                                // }
                                // Auto-populated by enrich-phoible-phonemes.mjs.

  // ═══════════════════════════════════════════════════════════════════════
  //  § 8. ENCYCLOPEDIC
  //  General knowledge about the language for human context. History,
  //  dialect situation, institutional resources, representative sayings.
  //  This section is for understanding, not computation.
  // ═══════════════════════════════════════════════════════════════════════

  "encyclopedic":    null,       // General knowledge. When populated:
                                 // {
                                 //   "family": "Algic",             // Redundant with classification
                                 //                                  // but useful for human readers.
                                 //   "dialects": {
                                 //     "split": true,               // Is there significant variation?
                                 //     "classification": "Plains Cree (y-dialect)",
                                 //     "variants": ["crk", "cwd", "csw"]  // ISO codes of variants
                                 //   },
                                 //   "demographics": {
                                 //     "speakers": "Approx. 20,000 active speakers",
                                 //     "regions": ["Saskatchewan", "Alberta", "Manitoba"]
                                 //   },
                                 //   "history": "Plains Cree is the most widely spoken Algonquian
                                 //              language in western Canada...",
                                 //   "resources": {
                                 //     "wikipedia": "https://en.wikipedia.org/wiki/Plains_Cree",
                                 //     "foundations": [{ "name": "ALTLab", "url": "https://..." }],
                                 //     "dictionaries": [{ "name": "itwêwina", "url": "https://..." }]
                                 //   }
                                 // }

  "culturalAphorism": null,      // A representative saying, proverb, or teaching in the language.
                                 // When populated:
                                 // {
                                 //   "text": "ê-wîcêhtonaniwahk kâ-kî-isi-wâpahtamâhk ôma pimâtisiwin",
                                 //   "transliteration": null,       // Romanized form if non-Latin script.
                                 //   "translation": "Through helping each other we come to understand
                                 //                   this life",
                                 //   "literal": "By-helping-one-another we-have-come-to-see this life",
                                 //   "source": "Cree teaching, documented in nêhiyawêwin educational
                                 //              resources"
                                 // }
                                 // Choose sayings that reveal something about the language's
                                 // worldview or structure. Must be sourced.

  "varieties":      [],          // For macrolanguages or languages with significant dialectal
                                 // variation, the individual varieties with their own tool coverage.
                                 // Each entry:
                                 // {
                                 //   "name": "Cusco Quechua",
                                 //   "iso639_3": "quz",
                                 //   "region": "Cusco, Peru",
                                 //   "fstCoverage": true,
                                 //   "corpusCoverage": true,
                                 //   "nllbCoverage": false,
                                 //   "mutualIntelligibility": "Primary variety for this card",
                                 //   "notes": "SQUOIA FST was built for this variety."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 9. DIGITAL RESOURCES & TOOLING
  //  What NLP tools, corpora, models, and datasets exist for this language?
  //  What translation APIs support it? What eval benchmarks are available?
  //
  //  This is Champollion's operational core — these fields determine what
  //  we can actually DO with this language.
  // ═══════════════════════════════════════════════════════════════════════

  "resources":      null,        // NLP resources available for this language. When populated:
                                 // {
                                 //   "fsts": [{                     // Finite-state transducers
                                 //     "name": "GiellaLT Plains Cree FST (lang-crk)",
                                 //     "url": "https://github.com/giellalt/lang-crk/releases",
                                 //     "type": "morphological-analyzer"
                                 //   }],
                                 //   "corpora": [{                  // Text corpora
                                 //     "name": "EDTeKLA Cree Language Textbook Corpus",
                                 //     "type": "parallel",          // "parallel", "monolingual"
                                 //     "pairs": ["en-crk"],
                                 //     "url": "https://...",
                                 //     "exposure": "open-web"       // "open-web", "restricted",
                                 //                                  // "holdout"
                                 //   }],
                                 //   "models": [{                   // Pre-trained models
                                 //     "name": "NLLB-200 (crk_Cans)",
                                 //     "url": "https://...",
                                 //     "type": "nmt"
                                 //   }],
                                 //   "tools": [],                   // Other NLP tools
                                 //   "wordlists": [{                // Standardized wordlists
                                 //     "name": "Lexibank",
                                 //     "conceptCount": 200,
                                 //     "source": "lexibank"
                                 //   }],
                                 //   "treebanks": [{                // Syntactic treebanks
                                 //     "name": "UD_Korean-GSD",
                                 //     "tokens": 80000,
                                 //     "source": "universal-dependencies-2.14"
                                 //   }]
                                 // }
                                 // IMPORTANT: Only actual NLP/digital resources belong here.
                                 // "This language has a WALS entry" is NOT a resource — that
                                 // goes in databaseCoverage.

  "databaseCoverage": null,      // Which typological/reference databases cover this language.
                                 // Separated from resources to avoid conflating "has a database
                                 // entry" with "has usable NLP tooling."
                                 // {
                                 //   "wals": true,
                                 //   "grambank": true,
                                 //   "phoible": true,
                                 //   "cldr": true,
                                 //   "lexibank": true,
                                 //   "commonVoice": true,
                                 //   "source": "derived"
                                 // }

  "corpusAvailability": null,    // What text/parallel corpora exist for NLP use?
                                 // {
                                 //   "bibleTranslation": {
                                 //     "textAvailable": true,
                                 //     "audioAvailable": true,
                                 //     "source": "bible-brain-api"
                                 //   },
                                 //   "opusCorpora": ["wikimedia", "ubuntu", "gnome"],
                                 //   "source": "multi-source"
                                 // }

  "keyboardSupport":  null,      // Input method / keyboard availability. When populated:
                                 // {
                                 //   "keymanKeyboards": 3,
                                 //       // Number of Keyman keyboards available.
                                 //   "cldrKeyboard": true,
                                 //       // CLDR has keyboard layout data.
                                 //   "source": "keyman-api + cldr"
                                 // }

  "methodSupport":  {            // REQUIRED. Which Champollion translation methods support this
                                 // language. Each method is an object with at minimum
                                 // { "supported": boolean }.
    "googleTranslate":     { "supported": false },
    "deepl":               { "supported": false },
    "microsoftTranslator": { "supported": false },
    "libreTranslate":      { "supported": false },
    "nllb":                { "supported": false },
                                 // When NLLB is supported, include the code:
                                 // { "supported": true, "code": "crk_Cans" }
    "llm":                 { "supported": true }
                                 // LLM is always true (quality varies by language).
                                 // Optional: "verifiedDate": "2026-06-07" for audit trail.
  },

  "metricModelSupport": null,   // Which MT evaluation models produce reliable scores.
                                // When populated:
                                // {
                                //   "xlmr": "high",          // "high", "medium", or "low"
                                //                            // XLM-R training representation tier.
                                //   "africomet": false        // true if AfriCOMET covers this language.
                                // }
                                // Drives automatic COMET model selection in metrics_comet.py.
                                // Auto-populated by enrich-metric-model-support.mjs.

  "metricPlugins":   null,      // Which per-language metric plugin packs are available.
                                // When populated:
                                // {
                                //   "formalityMarkers": true  // Formality marker resource file exists
                                //                             // at plugins/resources/formality/{code}.json
                                // }
                                // Each key corresponds to a resource pack in
                                // arena/mt_eval_harness/plugins/resources/{packName}/.
                                // To add a new metric pack for a language, create the resource
                                // file and set the flag here. No code changes required.

  "evalPack":       null,        // Evaluation dependency pack for language-specific metrics.
                                 // When populated, declares the Python dependencies and
                                 // post-install steps required by this language's eval standards.
                                 // The harness uses this for dependency gating: if deps are
                                 // missing, the harness warns the user and skips LYSS metrics
                                 // (rather than crashing).
                                 // When populated:
                                 // {
                                 //   "pythonDeps": {
                                 //     "pyhfst": "pyhfst>=1.4",    // PyPI package specs
                                 //     "requests": "requests>=2.28",
                                 //     "spacy": "spacy>=3.7"
                                 //   },
                                 //   "postInstall": [               // Commands to run after pip
                                 //     {
                                 //       "command": "spacy download en_core_web_md",
                                 //       "label": "spaCy English model (for LYSS-sem)"
                                 //     }
                                 //   ],
                                 //   "requiresFst": true,           // true if GiellaLT FST needed
                                 //   "description": "LYSS equivalence linter + FST validation"
                                 // }

  "evalMetrics":    null,        // Language-specific evaluation metrics (LYSS standards).
                                 // When populated, the harness dynamically imports these
                                 // MetricPlugin classes from eval_standards/<lang>/ and applies
                                 // them to every run targeting this language — regardless of
                                 // which method (contestant) is being evaluated.
                                 // Keyed by metric ID:
                                 // {
                                 //   "lyss-eq": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkLinterMetric",
                                 //     "description": "LYSS deterministic variant-class linter"
                                 //   },
                                 //   "lyss-sem": {
                                 //     "module": "eval_standards.crk.metrics",
                                 //     "class": "CrkSemanticMetric",
                                 //     "description": "LYSS FST-based semantic validator",
                                 //     "dependencies": ["spacy>=3.7"],
                                 //     "spacy_models": ["en_core_web_md"]
                                 //   }
                                 // }
                                 // Architecture: eval standards are referees, not contestants.
                                 // They live in the harness (eval_standards/), not in method
                                 // plugins. This ensures all methods are scored equally.
                                 // Discovery: plugin_discovery.py reads this field via
                                 // language_cards.get_eval_metrics() and instantiates metrics
                                 // using importlib. Dependencies are checked against evalPack.

  "omt1600":        null,        // Meta's OMT-1600 (One Model for Translation) coverage assessment.
                                 // When populated:
                                 // {
                                 //   "covered": true,
                                 //   "tier": "R1",                  // Meta's resource tier
                                 //   "evalMetrics": ["chrF++", "BLASER-3"],
                                 //   "notes": "Plains Cree: no web-crawled bitext..."
                                 // }

  "evalDatasets":   [],          // Evaluation dataset IDs available for this language.
                                 // Example: ["flores-plus-devtest", "edtekla-dev-v1"].
                                 // Empty means no standardized eval set exists.

  "pipelineReadiness": null,     // Assessment of readiness for Champollion's translation pipeline.
                                 // When populated:
                                 // {
                                 //   "tier": "tier-2-feasible",
                                 //       // "watch-list"       — cataloged but no path to translation
                                 //       // "tier-3-cataloged" — basic metadata present
                                 //       // "tier-2-feasible"  — tools exist, pipeline possible
                                 //       // "tier-1-ready"     — pipeline operational
                                 //   "hasFST": true,
                                 //   "hasParallelCorpus": true,
                                 //   "hasEvalBenchmark": true,
                                 //   "blockers": ["Syllabics post-processing validation"],
                                 //   "notes": "FST-gated pipeline operational. EDTeKLA corpus..."
                                 // }

  // ═══════════════════════════════════════════════════════════════════════
  //  § 10. PROVENANCE & METADATA
  //  Where does this data come from? Who reviewed it? When was it
  //  generated? What's its overall quality level?
  //
  //  This section exists to make the card auditable. Every automated
  //  enrichment, every human review, every source consulted should
  //  leave a trace here.
  // ═══════════════════════════════════════════════════════════════════════

  "dataSources":   [],           // REQUIRED. Sources consulted for this card's data.
                                 // Can be a flat array (backwards-compatible):
                                 //   ["iso639-3-2024", "glottolog-5.3", "wikidata"]
                                 //
                                 // Or a structured per-field object (preferred for new cards):
                                 //   {
                                 //     "classification": ["glottolog-5.3"],
                                 //     "vitality": ["glottolog-aes-5.3", "unesco-atlas-2024"],
                                 //     "speakerEstimates": ["wikidata", "census-ca-2021"],
                                 //     "rules": ["cldr-48"],
                                 //     "methodSupport": ["google-translate-2026-06"]
                                 //   }

  "supportTier":   "cataloged",  // Auto-derived tier summarizing the card's depth:
                                 //   "cataloged"   — identity + classification only
                                 //   "emerging"    — + vitality + speakerEstimates
                                 //   "developing"  — + resources + methodSupport
                                 //   "supported"   — full research: registers, challenges, etc.

  "humanReviewed": null,         // null until a qualified human reviews the card. When populated:
                                 // {
                                 //   "reviewer": "Prof. Kenneth Jamandre",
                                 //   "affiliation": "University of the Philippines Diliman",
                                 //   "date": "2026-06-08",
                                 //   "scope": "full",             // "full", "partial", "vitality-only"
                                 //   "notes": "Verified speaker count, vitality assessment,
                                 //             and contact influences for Tagalog."
                                 // }

  "notes":         null,         // Free-text notes about this language or this card's data quality.
                                 // Example: "Low-resource language under active development.
                                 //           Translation pipeline uses FST-gated approach."

  "firstDocumented": null,       // Year of first known documentation. Negative for BCE.
                                 // Example: -1500 (Sanskrit, ~1500 BCE), 1787 (some languages).
                                 // Source: Glottolog CLDF.

  "lastDocumented":  null,       // Year of last known documentation (relevant for extinct languages).
                                 // Source: Glottolog CLDF.

  "_generated":    null          // Auto-populated by enrichment scripts. When populated:
                                 // {
                                 //   "by": "generate-all-cards.mjs",
                                 //   "at": "2026-06-07T12:34:56Z",
                                 //   "sources": ["iso639-3", "glottolog-5.3", "wikidata"],
                                 //   "completeness": "partial",
                                 //       // "partial"     — has identity + classification + coords
                                 //       // "substantial" — + vitality + speakerEstimates + script
                                 //       // "complete"    — all automatable fields populated
                                 //   "lastEnriched": "2026-06-07"
                                 // }
}

Referencia de Campos

§ 1. Campos de Identidad

Campo	Tipo	Requerido	Automatizable	Fuente
`code`	`string`	✅	✅	Registro ISO 639-3
`name`	`string`	✅	✅	Registro ISO 639-3
`nativeName`	`string \| null`	—	✅	Wikidata P1705
`alternateNames`	`string[]`	—	✅	Glottolog, Ethnologue
`iso639_3`	`string`	✅	✅	Registro ISO 639-3
`iso639_1`	`string \| null`	—	✅	ISO 639-1
`bcp47`	`string \| null`	—	Parcial	Registro de subtags IANA
`aliases`	`string[]`	—	❌	Curación manual
`isoScope`	`string`	✅	✅	Registro ISO 639-3
`isoType`	`string`	✅	✅	Registro ISO 639-3
`macrolanguage`	`string \| null`	—	✅	ISO 639-3 macrolanguages.tab
`extends`	`string \| null`	—	❌	Curación manual

§ 2. Campos de Clasificación

Campo	Tipo	Requerido	Automatizable	Fuente
`glottocode`	`string \| null`	—	✅	Glottolog
`classification`	`object \| null`	—	✅	Glottolog
`isIsolate`	`boolean`	—	✅	Glottolog CLDF

§ 3. Campos de Geografía

Campo	Tipo	Requerido	Automatizable	Fuente
`macroarea`	`string \| null`	—	✅	Glottolog CLDF
`coordinates`	`object \| null`	—	✅	Glottolog
`countries`	`string[]`	—	✅	Glottolog
`regions`	`object[]`	—	❌	Censo, Ethnologue, manual
`arealContext`	`object \| null`	—	✅	Coordenadas + zonas de área lingüística

§ 4. Campos de Sistema de Escritura

Campo	Tipo	Requerido	Automatizable	Fuente
`script`	`string \| null`	—	✅	Wikidata P282
`scriptUnicodeName`	`string \| null`	—	✅	Derivado de `script` vía mapeo ISO 15924 → Unicode
`scripts`	`object[]`	—	Parcial	Wikidata, manual
`dir`	`string \| null`	—	✅	Derivable del script
`scriptConverter`	`string \| null`	—	❌	Manual
`orthographicStatus`	`object \| null`	—	Parcial	Ethnologue, manual

§ 5. Campos de Demografía y Vitalidad

Campo	Tipo	Requerido	Automatizable	Fuente
`speakerEstimates`	`object[]`	—	✅	Wikidata, Ethnologue, censo
`vitality`	`object \| null`	—	✅	Glottolog AES, UNESCO

§ 5.5 Campos de Documentación y Presencia Digital

Campo	Tipo	Requerido	Automatizable	Fuente
`documentationDepth`	`object \| null`	—	✅	Referencias de Glottolog
`digitalPresence`	`object \| null`	—	✅	Wikipedia, Common Voice, Tatoeba
`dialectCount`	`number \| null`	—	✅	Glottolog

§ 6. Campos de Formalidad, Registro y Género

Campo	Tipo	Requerido	Automatizable	Fuente
`formality`	`object \| null`	—	❌	Investigación lingüística
`registers`	`object \| null`	—	❌	Investigación lingüística
`gender`	`object \| null`	—	❌	Investigación lingüística
`codeSwitching`	`object \| null`	—	❌	Investigación lingüística

§ 7. Campos de Perfil Lingüístico

Campo	Tipo	Requerido	Automatizable	Fuente
`linguisticChallenges`	`object \| null`	—	❌	Investigación lingüística
`contactInfluences`	`object[]`	—	❌	Lingüística publicada
`rules`	`object \| null`	—	✅	CLDR
`typologicalProfile`	`object \| null`	—	✅	Grambank 1.0.3 — poblado automáticamente por `enrich-grambank-typology.mjs`
`phonologicalInventory`	`object \| null`	—	✅	PHOIBLE 2.0 — poblado automáticamente por `enrich-phoible-phonemes.mjs`

§ 8. Campos Enciclopédicos

Campo	Tipo	Requerido	Automatizable	Fuente
`encyclopedic`	`object \| null`	—	❌	Investigación manual
`culturalAphorism`	`object \| null`	—	❌	Contribución comunitaria
`varieties`	`object[]`	—	❌	Investigación manual

§ 9. Campos de Recursos Digitales

Campo	Tipo	Requerido	Automatizable	Fuente
`resources`	`object \| null`	—	Parcial	Manual + automatizado
`databaseCoverage`	`object \| null`	—	✅	Derivado del enriquecimiento
`corpusAvailability`	`object \| null`	—	✅	Bible Brain, OPUS, Lexibank
`keyboardSupport`	`object \| null`	—	✅	API de Keyman, CLDR
`methodSupport`	`object`	✅	Parcial	Verificación de API
`metricModelSupport`	`object \| null`	—	✅	Documento XLM-R, documento AfriCOMET
`metricPlugins`	`object \| null`	—	✅	Enriquecimiento de tarjeta — declara qué paquetes de plugins de métrica aplican (p. ej., `{ formalityMarkers: true }`)
`omt1600`	`object \| null`	—	✅	Evaluación meta
`evalDatasets`	`string[]`	—	✅	Registro de conjuntos de datos
`pipelineReadiness`	`object \| null`	—	Parcial	Derivado + manual

resources.fsts[].install: Las entradas FST en el objeto resources pueden incluir un subobjeto install con campos: repo, releaseTag, assetPattern, format, maturity, y opcionalmente bundlePattern. Esto reemplaza el anterior dict codificado GIELLALT_FST_REGISTRY. Ver get_fst_install_info() en language_cards.py.

§ 10. Campos de Procedencia

Campo	Tipo	Requerido	Automatizable	Fuente
`dataSources`	`array \| object`	✅	✅	Auto + manual
`supportTier`	`string`	—	✅	Derivado de completitud de tarjeta
`humanReviewed`	`object \| null`	—	❌	Revisor humano
`notes`	`string \| null`	—	❌	Manual
`firstDocumented`	`number \| null`	—	✅	Glottolog CLDF
`lastDocumented`	`number \| null`	—	✅	Glottolog CLDF
`_generated`	`object \| null`	—	✅	Scripts de enriquecimiento

Política de Códigos de Idioma

Champollion usa ISO 639-3 como identificador canónico. Otros códigos estándar se registran como alias y se resuelven al código ISO 639-3 en tiempo de ejecución.

Prioridad	Estándar	Ejemplo	Campo	Uso
1 (canónico)	ISO 639-3	`crk`	`code`	Nombre de archivo de tarjeta, claves de config, parámetros de API
2 (alias)	ISO 639-1	`iu`	`aliases[]`	Aceptado en CLI, resuelto a ISO 639-3
3 (alias)	BCP 47	`fil`	`aliases[]`	Aceptado en CLI, resuelto a ISO 639-3
Referencia	Glottocode	`plai1258`	`glottocode`	Solo clasificación, no para tiempo de ejecución

Orden de resolución: Cuando un usuario proporciona un código:

Coincidencia directa en card.code → encontrado
Coincidencia en card.aliases[] → encontrado, devolver la tarjeta canónica
Coincidencia en card.iso639_1 → encontrado (alternativa)
No encontrado → error

Historial de Migración: ISO 639-1 → ISO 639-3

Antes de v8, los nombres de archivo de tarjeta usaban códigos ISO 639-1 cuando estaban disponibles (fr.json, de.json, ja.json). En la migración 639-3, todas las tarjetas fueron renombradas a sus equivalentes ISO 639-3:

Antes	Después	Por qué
`fr.json`	`fra.json`	639-3 es canónico
`de.json`	`deu.json`	639-3 es canónico
`zh.json`	`cmn.json`	Macroidioma → individual por defecto
`ar.json`	`arb.json`	Macroidioma → Árabe Estándar Moderno
`ms.json`	`zsm.json`	Macroidioma → Malayo Estándar

¿Qué pasó con los códigos antiguos?

El código 639-1 antiguo está en card.iso639_1
El código 639-1 antiguo está en card.aliases[]
resolveCode("fr") devuelve "fra" en tiempo de ejecución — compatible hacia atrás
Los usuarios aún pueden escribir "fr" en su config — se resuelve transparentemente

Qué cambió arquitectónicamente:

_deepMerge() ahora omite valores null (hereda del padre)
_deepMerge() ahora tiene un campo de identidad establecido (código, extiende, alias nunca heredados)
formality.default ahora se deriva de banderas de registro isDefault: true
205 tarjetas derivadas de Grambank obtuvieron corrección estructural formality.default
38 tarjetas de género/familia/macroidioma proporcionan objetivos de herencia

Casos Especiales

Lenguas de Signos

Las lenguas de signos (p. ej., ASE — Lengua de Signos Americana) son idiomas legítimos con códigos ISO 639-3. Tienen geografía y conteos de hablantes pero:

script típicamente es null (sin forma escrita estándar)
scripts puede incluir "Sgnw" (SignWriting) si se usa un sistema de notación
dir es null
linguisticChallenges debe abordar gramática espacial, clasificadores, etc.
gender.grammatical típicamente es false

Idiomas Antiguos e Históricos

Idiomas como Latín (lat, isoType H) y Sánscrito (san, isoType H) aún se usan en contextos específicos (litúrgico, académico) pero no tienen hablantes nativos:

vitality puede notar "sin hablantes nativos" con "trend": "stable" (no en declive — la comunidad que lo usa es estable, solo pequeña)
speakerEstimates debe notar que estos son hablantes L2, no L1
firstDocumented / lastDocumented los ubican en el tiempo

Idiomas Construidos

Esperanto (epo, isoType C), Lojban, etc.:

classification puede apuntar a una familia "construida" o nulo
contactInfluences refleja el material fuente (p. ej., Esperanto se basa en Romance, Germánico, Eslavo)
vitality es inusual — comunidad de hablantes en crecimiento pero sin patria nativa

Macroidiomas

Árabe (ara), Chino (zho), Cree (cre), Quechua (que) son macroidiomas que abarcan múltiples idiomas individuales:

isoScope: "M"
varieties debe listar los idiomas individuales con sus códigos ISO
methodSupport debe reflejar lo que la tarjeta de macroidioma soporta (usualmente la variedad estandarizada)
Las variedades individuales también deben tener sus propias tarjetas

Idiomas Sin Ortografía Estandarizada

Muchos idiomas (especialmente idiomas de tradición oral) no tienen un sistema de escritura estandarizado, u tienen ortografías en competencia:

script es null
scripts es []
dir es null
notes debe explicar la situación ortográfica
linguisticChallenges debe notar cómo esto afecta MT (p. ej., sin datos de entrenamiento)

Diglosia

Idiomas como Árabe (MSA vs. dialectos) o Guaraní (Jopará vs. Guaraní puro):

codeSwitching captura la situación de variedad mixta
registers puede ofrecer presets para diferentes niveles
varieties puede listar el par diglósico

Tipos de Influencia de Contacto

Tipo	Significado	Ejemplo
`superstrate`	Idioma dominante impuesto en una comunidad	Francés → Inglés (post-1066)
`substrate`	Idioma nativo influyendo un idioma impuesto	Celta → Inglés
`adstrate`	Idioma vecino con influencia mutua	Nórdico → Inglés
`learned_borrowing`	Préstamos a través de educación/erudición	Latín → Inglés
`lexical_borrowing`	Préstamos de vocabulario directo a través de contacto	Español → Filipino
`relexification`	Reemplazo de vocabulario completo	Portugués → Papiamentu

Profundidades de Influencia de Contacto

Profundidad	Significado
`light`	Algunas palabras prestadas, impacto estructural mínimo
`moderate`	Vocabulario significativo en dominios específicos
`heavy`	Vocabulario generalizado y algunas características estructurales
`structural`	Gramática, sintaxis y fonología afectadas
`defining`	Identidad central moldeada por contacto (criollos, idiomas mixtos)

Escribir Buenos Presets de Registro

Buenos prompts de preset:

Nombrar explícitamente la característica de formalidad (p. ej., "해요체", "forma vous", "forma siz")
Explicar el pronombre o forma verbal específica a usar
Dar contexto para cuándo este registro es apropiado
Mencionar consideraciones de script si aplica

No ponga orientación de género inclusivo en el prompt de preset. La orientación de género pertenece a card.gender.inclusiveGuidance — se inyecta por separado.

❌ Bad:  "Standard Thai. Professional register."
✔ Good: "Professional Thai. Use คุณ (khun) for second person, เรา (rao)
         for first person when needed. Clear, concise phrasing
         appropriate for digital interfaces."

Convención de Nombres de Preset

Las claves de preset deben ser descriptivas y en minúsculas con guiones:

Idiomas T-V: formal-vous, informal-tu, formal-Sie, casual-du
Niveles de habla: polite-haeyo, formal-hapsyo, casual-hae
Neutral: professional, neutral-professional
Code-switching: taglish-professional, pure-filipino

Procedimiento de Enriquecimiento

Orden de Procesamiento Por Tarjeta

Al enriquecer una tarjeta, consulte fuentes en este orden. Documente cada fuente consultada, incluso si no devolvió datos.

Registro ISO 639-3 → code, name, isoScope, isoType
ISO 639-3 macrolanguages.tab → macrolanguage
Glottolog languoid.csv → glottocode, classification, coordinates, countries
Glottolog CLDF → macroarea, isIsolate, firstDocumented, lastDocumented
Glottolog AES → vitality (estado de peligro)
Wikidata SPARQL → nativeName, speakerEstimates, script, scripts, dir
CLDR → rules (tipografía, plurales, capitalización)
NLLB-200 / FLORES+ → methodSupport.nllb, evalDatasets
Verificación de API → entradas methodSupport restantes
Documentos de modelos ML → metricModelSupport (datos de entrenamiento XLM-R, cobertura AfriCOMET) Script: node scripts/enrich-metric-model-support.mjs

Manejo de Conflictos

Cuando las fuentes no están de acuerdo:

Almacenar ambas con atribución de fuente
NO promediar ni elegir bandos
Notar la discrepancia en el campo note relevante
Preferir la fuente primaria más reciente solo cuando un valor único es necesario para computación

Validación

Ejecute el linter después de cualquier enriquecimiento o edición manual:

node scripts/lint-language-cards.mjs              # all cards
node scripts/lint-language-cards.mjs --lang crk    # single card

Lista de Verificación de PR

Al enviar una tarjeta de idioma nueva o modificada:

Archivo nombrado <code>.json en shared/language-cards/
Todos los campos de nivel superior de la plantilla canónica están presentes
classification poblado desde Glottolog (no construido a mano)
dataSources lista todas las fuentes consultadas
Entradas methodSupport verificadas contra listas de idiomas de API reales
Entradas contactInfluences tienen fuentes publicadas o citation_needed: true
linguisticChallenges con 3–6 desafíos relevantes para MT (si se investigó)
rules poblado desde CLDR (si existen datos de locale)
Linter pasa sin errores

Referencias Profesionales

Estándar	Mantenido Por	Nuestro Uso
ISO 639-3	SIL International	Códigos de idioma canónicos, relaciones de macroidioma
Glottolog	Max Planck Institute	Clasificación, coordenadas, peligro AES
WALS	Max Planck Institute	Definiciones de género, características tipológicas
ISO 15924	Unicode/ISO	Códigos de script
CLDR	Unicode Consortium	Datos de locale, reglas de plural, tipografía
Wikidata	Wikimedia Foundation	Conteos de hablantes, endónimos, datos de script
Ethnologue	SIL International	EGIDS, estimaciones de hablantes, DLS
UNESCO Atlas	UNESCO	Clasificación de peligro
Katig Collective	UP Diliman	Cápsulas de idiomas filipinos

Ver también: Procedimiento de Citación de Tarjeta de Idioma para orientación detallada fuente por fuente.

Principios de Diseño​

Arquitectura de Tres Capas​

Modelo de Herencia​

Semántica de Fusión​

Campos de Identidad (Nunca Heredados)​

Ejemplo: Cómo se Resuelve una Tarjeta Cree​

Plantilla de Tarjeta de Género​

Plantilla Canónica​

Referencia de Campos​

§ 1. Campos de Identidad​

§ 2. Campos de Clasificación​

§ 3. Campos de Geografía​

§ 4. Campos de Sistema de Escritura​

§ 5. Campos de Demografía y Vitalidad​

§ 5.5 Campos de Documentación y Presencia Digital​

§ 6. Campos de Formalidad, Registro y Género​

§ 7. Campos de Perfil Lingüístico​

§ 8. Campos Enciclopédicos​

§ 9. Campos de Recursos Digitales​

§ 10. Campos de Procedencia​

Política de Códigos de Idioma​

Historial de Migración: ISO 639-1 → ISO 639-3​

Casos Especiales​

Lenguas de Signos​

Idiomas Antiguos e Históricos​

Idiomas Construidos​

Macroidiomas​

Idiomas Sin Ortografía Estandarizada​

Diglosia​

Tipos de Influencia de Contacto​

Profundidades de Influencia de Contacto​

Escribir Buenos Presets de Registro​

Convención de Nombres de Preset​

Procedimiento de Enriquecimiento​

Orden de Procesamiento Por Tarjeta​

Manejo de Conflictos​

Validación​

Lista de Verificación de PR​

Referencias Profesionales​