also: inflectional paradigm · paradigms
The full set of inflected forms a word can take — like a verb conjugation table. Paradigms range from two forms (English 'must') to thousands in polysynthetic languages.
Why it matters for MT: Large paradigms guarantee that most word forms are rare or unseen in training data.
also: parallel text · bitext · parallel corpora · parallel data
A collection of texts paired with their translations, aligned sentence by sentence. Parallel corpora are the primary fuel for training and evaluating MT systems.
Why it matters for MT: The size and domain of available parallel data is the strongest single predictor of MT quality for a language pair.
also: participles · participial
A verb form that acts like an adjective or builds compound tenses — 'the running water', 'has eaten'. Languages differ in how many participles they have and what they are used for.
Why it matters for MT: Participial clauses often replace relative clauses in other languages, requiring structural conversion.
also: particles · sentence-final particle · discourse particle · topic marker
A small, uninflected function word that adds grammatical or attitudinal meaning — question markers, topic markers, politeness softeners. East Asian languages make heavy use of sentence-final particles.
Why it matters for MT: Particles carry meaning (questionhood, attitude, topic) that MT must re-express by entirely different means.
also: passive voice · passive constructions
A construction that promotes the object to subject and demotes or drops the doer: 'the window was broken (by the boy)'. Many languages lack a passive entirely or use other strategies to background the agent.
Why it matters for MT: Passive-less target languages force MT to restructure passives into actives, inventing or recovering the agent.
also: perfective aspect
Aspect presenting an event as a complete whole — 'she wrote the letter' viewed as one finished fact. Often paired with imperfective in a grammatical opposition.
Why it matters for MT: Choosing perfective vs imperfective wrongly is among the most common MT errors into Slavic languages.
also: pharyngeals · pharyngeal consonants
Consonants made by squeezing the throat (pharynx), like Arabic ʿayn (ع). They are rare worldwide and hard for non-native speakers to hear or produce.
Why it matters for MT: Pharyngeals are romanized many ways (ʿ, ', 3, or nothing), creating spelling chaos in informal text.
also: phonemes · phonemic · phoneme inventory · consonant inventory · vowel inventory
A speech sound that distinguishes words in a particular language — swap one phoneme for another and you get a different word (pat vs bat). A language's phoneme inventory ranges from about a dozen sounds to well over a hundred.
Why it matters for MT: Inventory size and content determine how foreign names and loanwords get reshaped in the language.
also: pidgins
A simplified contact language with no native speakers, created for trade or work between groups with no common tongue. When children grow up speaking one natively, it becomes a creole.
Why it matters for MT: Pidgins have high variability and thin text data, making consistent MT especially hard.
also: pitch-accent
A system where pitch distinguishes words, but only one syllable per word carries the distinctive pitch — Japanese háshi 'chopsticks' vs hashí 'bridge'. Lighter than full tone, heavier than pure stress.
Why it matters for MT: Like tone, pitch accent is rarely written, so homographs multiply in text.
also: politeness distinctions · formality · formality system · politeness levels
The linguistic encoding of social relationships — through pronoun choice, verb endings, particles, or vocabulary. Languages range from no grammatical politeness to elaborate multi-level systems.
Why it matters for MT: A translation can be lexically perfect and still fail by choosing the wrong politeness level for the situation.
also: polypersonalism · polypersonal
Verb agreement with more than one participant at once — the verb carries markers for both subject and object (and sometimes more). Basque, Georgian, and Algonquian languages do this systematically.
Why it matters for MT: The verb form encodes who acts on whom, so MT must resolve both roles before it can produce a single correct verb.
also: polysynthetic · polysynthetic language
A word-building style where a single verb can contain what other languages express as a whole sentence — subject, object, location, instrument and more, all as parts of one word. Many Indigenous American languages work this way.
Why it matters for MT: Polysynthetic words rarely repeat exactly, so word-level MT sees an endless stream of unknown tokens.
Example: Plains Cree (crk): a single verb can incorporate subject/object pronouns, instrumentals, locations, and actions (card field linguisticChallenges.polysynthesis).
also: possessive · possessives · alienable · inalienable · possessive affixes
How a language expresses 'my X / your X'. Many languages distinguish inalienable possession (body parts, kin — things you cannot give away) from alienable, marking them with different constructions.
Why it matters for MT: Inalienable possession often requires obligatory possessor marking that English sources omit.
also: prefixes · prefixing
An affix that attaches to the front of a word stem, like re- in 'rewrite'. Some languages, including many Bantu and Athabaskan languages, carry most of their grammar in strings of prefixes.
Why it matters for MT: Prefix-heavy languages put grammatical information at the start of words, the opposite of what suffix-trained tokenizers expect.