Documentation: Language Codes

From UniLang Wiki

A full list of all language codes currently in use by UniLang, go to http://home.unilang.org/main/langlist.php

A full specification of picking language codes follows below:

  • Our language codes are first picked from iso-639-1 (two letters) if the language is available there (this is what most operating systems use to specify locales, it's also used in (x)HTML ). Listing: |http://www.loc.gov/standards/iso639-2/englangn.html
  • If not available, iso-639-3/SIL is picked (three letters). (this is what ethnologue and mediaglyphs use). Listing:

http://www.ethnologue.com/language_code_index.asp

This will cover practically all (98%) of the languages we need. But there are some notable exceptions: for ancient extinct languages (phoenician, akkadian, ugaritic etc.. ) we often have no choice but to resort to yet another standard, iso-639-2 (three letters). This currently applies to codes from the array returned by extinctlangs() (lang_optionstrings.php)

Then there are some meta-codes as well for languages that are loosely grouped such as Zapotec, here we also fall back on iso-639-2 .. These language codes are returned from the array returned by grouplangs() .


Next, writing style variants are marked by a dot, the most common one or institutionalized version being the default

example:

  • zh - Mandarin Chinese, Simplified Script
  • zh.t - Mandarin Chinese, Traditional Script
  • yue - Cantonese Chinese, Simplified Script
  • yue.t - Cantonese Chinese, Traditional Script

Regional variants are denoted with a hyphen and an upper case country code (iso-3166) or region code.

  • pt - Portuguese (general)
  • pt-PT - European Portuguese
  • pt-BR - Brazilian Portuguese

Other variants, mostly temporal variants, may be specified with an underscore:

  • en_old - old english
  • en_mid - middle english
Personal tools