Documentation: Language Codes
From UniLang Wiki
A full list of all language codes currently in use by UniLang, go to http://home.unilang.org/main/langlist.php
A full specification of picking language codes follows below:
- Our language codes are first picked from iso-639-1 (two letters) if the language is available there (this is what most operating systems use to specify locales, it's also used in (x)HTML ). Listing: |http://www.loc.gov/standards/iso639-2/englangn.html
- If not available, iso-639-3/SIL is picked (three letters). (this is what ethnologue and mediaglyphs use). Listing:
http://www.ethnologue.com/language_code_index.asp
This will cover practically all (98%) of the languages we need. But there are some notable exceptions: for ancient extinct languages (phoenician, akkadian, ugaritic etc.. ) we often have no choice but to resort to yet another standard, iso-639-2 (three letters). This currently applies to codes from the array returned by extinctlangs() (lang_optionstrings.php)
Then there are some meta-codes as well for languages that are loosely grouped such as Zapotec, here we also fall back on iso-639-2 .. These language codes are returned from the array returned by grouplangs() .
Next, writing style variants are marked by a dot, the most common one or institutionalized version being the default
example:
- zh - Mandarin Chinese, Simplified Script
- zh.t - Mandarin Chinese, Traditional Script
- yue - Cantonese Chinese, Simplified Script
- yue.t - Cantonese Chinese, Traditional Script
Regional variants are denoted with a hyphen and an upper case country code (iso-3166) or region code.
- pt - Portuguese (general)
- pt-PT - European Portuguese
- pt-BR - Brazilian Portuguese
Other variants, mostly temporal variants, may be specified with an underscore:
- en_old - old english
- en_mid - middle english
