This module is used to retrieve and manage the languages that can have Wiktionary entries, and the information associated with them. See Wiktionary:Languages for more information.

For the languages and language varieties that may be used in etymologies, see Module:etymology languages. For language families, which sometimes also appear in etymologies, see Module:families.

This module provides access to other modules. To access the information from within a template, see Module:languages/templates.

The information itself is stored in the various data modules that are subpages of this module. These modules should not be used directly by any other module, the data should only be accessed through the functions provided by this module.

Data submodules:

Finding and retrieving languages szerkesztés

The module exports a number of functions that are used to find languages.

getByCode szerkesztés

getByCode(code, paramForError, allowEtymLang, allowFamily)

Finds the language whose code matches the one provided. If it exists, it returns a Language object representing the language. Otherwise, it returns nil, unless paramForError is given, in which case an error is generated. If paramForError is true, a generic error message mentioning the bad code is generated; otherwise paramForError should be a string or number specifying the parameter that the code came from, and this parameter will be mentioned in the error message along with the bad code. If allowEtymLang is specified, etymology language codes are allowed and looked up along with normal language codes. If allowFamily is specified, language family codes are allowed and looked up along with normal language codes.

getByCanonicalName szerkesztés

getByCanonicalName(code, errorIfInvalid, allowEtymLang, allowFamily)

Finds the language whose canonical name (the name used to represent that language on Wiktionary) or other name matches the one provided. If it exists, it returns a Language object representing the language. Otherwise, it returns nil, unless paramForError is given, in which case an error is generated. If allowEtymLang is specified, etymology language codes are allowed and looked up along with normal language codes. If allowFamily is specified, language family codes are allowed and looked up along with normal language codes.

The canonical name of languages should always be unique (it is an error for two languages on Wiktionary to share the same canonical name), so this is guaranteed to give at most one result.

This function is powered by Module:languages/canonical names, which contains a pre-generated mapping of non-etymology-language canonical names to codes. It is generated by going through the Category:Language data modules for non-etymology languages. When allowEtymLang is specified for the above function, Module:etymology languages/by name may also be used, and when allowFamily is specified for the above function, Module:families/by name may also be used.

getByName szerkesztés


Like getByCanonicalName(), except it also looks at the otherNames listed in the non-etymology language data modules, and does not (currently) have options to look up etymology languages and families.

iterateAll szerkesztés


This function is expensive

Returns a table containing Language objects for all languages, sorted by code.

This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.

Language objects szerkesztés

A Language object is returned from one of the functions above. It is a Lua representation of a language and the data associated with it. It has a number of methods that can be called on it, using the : syntax. For example:

local m_languages = require("Module:languages")
local lang = m_languages.getByCode("fr")
local name = lang:getCanonicalName()
-- "name" will now be "French"

Language:getCode szerkesztés


Returns the language code of the language. Example: "fr" for French.

Language:getCanonicalName szerkesztés


Returns the canonical name of the language. This is the name used to represent that language on Wiktionary, and is guaranteed to be unique to that language alone. Example: "French" for French.

Language:getAllNames szerkesztés


Returns a table of all names that the language is known by, including the canonical name. The names are not guaranteed to be unique, sometimes more than one language is known by the same name. Example: {"French", "Modern French"} for French.

Language:getType szerkesztés


Returns the type of language, which can be "regular", "reconstructed" or "appendix-constructed".

Language:getWikimediaLanguages szerkesztés


Returns a table containing WikimediaLanguage objects (see Module:wikimedia languages), which represent languages and their codes as they are used in Wikimedia projects for interwiki linking and such. More than one object may be returned, as a single Wiktionary language may correspond to multiple Wikimedia languages. For example, Wiktionary's single code sh (Serbo-Croatian) maps to four Wikimedia codes: sh (Serbo-Croatian), bs (Bosnian), hr (Croatian) and sr (Serbian).

The code for the Wikimedia language is retrieved from the wikimedia_codes property in the data modules. If that property is not present, the code of the current language is used. If none of the available codes is actually a valid Wikimedia code, an empty table is returned.

Language:getWikipediaArticle szerkesztés


Returns the name of the Wikipedia article for the language. If the property wikipedia_article is present in the data module it will be used first, otherwise a sitelink will be generated from :getWikidataItem (if set). Otherwise :getCategoryName is used as fallback.

Language:getWikidataItem szerkesztés


Returns the Wikidata item id for the language or nil. This corresponds to the the second field in the data modules.

Language:getScripts szerkesztés


Returns a table of Script objects for all scripts that the language is written in. See Module:scripts.

Language:getScriptCodes szerkesztés


Returns the table of script codes in the language's data file.

Language:getFamily szerkesztés


Returns a Family object for the language family that the language belongs to. See Module:families.

Language:getAncestors szerkesztés


Returns a table of Language objects for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.

Language:getCategoryName szerkesztés


Returns the name of the main category of that language. Example: "French language" for French, whose category is at Category:French language.

Language:makeCategoryLink szerkesztés


Creates a link to the category; the link text is the canonical name.

Language:makeEntryName szerkesztés


Converts the given term into the form used in the names of entries. This removes diacritical marks from the term if they are not considered part of the normal written form of the language, and which therefore are not permitted in page names. It also removes certain punctuation characters like final question marks or periods which are never present in page names. Example for Latin: "amō""amo" (macron is removed).

The replacements made by this function are defined by the entry_name setting for each language in the data modules.

Language:makeSortKey szerkesztés


Creates a sort key for the given entry name, following the rules appropriate for the language. This removes diacritical marks from the entry name if they are not considered significant for sorting, and may perform some other changes. Any initial hyphen is also removed, and anything parentheses is removed as well.

The sort_key setting for each language in the data modules defines the replacements made by this function, or it gives the name of the module that takes the entry name and returns a sortkey.

Language:transliterate szerkesztés

:transliterate(text, sc, module_override)

Transliterates the text from the given script into the Latin script (see Wiktionary:Transliteration and romanization). The language must have the translit_module property for this to work; if it is not present, nil is returned.

The sc parameter is handled by the transliteration module, and how it is handled is specific to that module. Some transliteration modules may tolerate nil as the script, others require it to be one of the possible scripts that the module can transliterate, and will show an error if it's not one of them. For this reason, the sc parameter should always be provided when writing non-language-specific code.

The module_override parameter is used to override the default module that is used to provide the transliteration. This is useful in cases where you need to demonstrate a particular module in use, but there is no default module yet, or you want to demonstrate an alternative version of a transliteration module before making it official. It should not be used in real modules or templates, only for testing. All uses of this parameter are tracked by Template:tracking/module_override.

Language:hasTranslit szerkesztés


Returns true if the language has a transliteration module, false if it doesn't.

Language:getRawData szerkesztés


This function is not for use in entries or other content pages.

Returns a blob of data about the language. The format of this blob is undocumented, and perhaps unstable; it's intended for things like the module's own unit-tests, which are "close friends" with the module and will be kept up-to-date as the format changes.

Error function szerkesztés

err(lang, param, text)

Looks at a supposed language code passed through a template parameter and returns a helpful error message depending on whether the language code has a valid form (two or three lowercase basic Latin letters, two or three groups of three lowercase basic Latin letters separated by hyphens).

Add the parameter value in argument #1 and the parameter name in argument #2. For instance, if parameter 1 of the template is supposed to be a language code, this function can be called the following way:

local m_languages = require("Module:languages")
local lang = m_languages.getByCode(frame.args[1]) or m_languages.err(frame.args[1], 1)

If you would like the error message to say something other than "language code", place the phrase in argument #3.

See also szerkesztés

local export = {}

Throw an error for an invalid language code or script code.

`lang_code` (required) is the bad code and can be nil or a non-string.

`param` (required) is the name of the parameter in which the code was contained. It can be a string, a number
	(for a numeric param, in which case the param will show up in the error message as an ordinal such as
	"first" or "second"), or `true` if no parameter can be clearly identified.

`code_desc` (optional) is text describing what the code is; by default, "language code".

`template_text` (optional) is a string specifying the template that generated the error, or a function
	to generate this string. If given, it will be displayed in the error message.

`not_real_lang` (optional), if given, indicates that the code is not in the form of a language code
	(e.g. it's a script code). Normally, this function checks for things that could plausibly be a language code:
	two or three lowercase letters, two or three groups of three lowercase letters with hyphens between them.
	If such a pattern is found, a different error message is displayed (indicating an invalid code) than otherwise
	(indicating a missing code). If `not_real_lang` is given, this check is suppressed.

function export.err(lang_code, param, code_desc, template_tag, not_real_lang)
	local ordinals = {
		"first", "second", "third", "fourth", "fifth", "sixth",
		"seventh", "eighth", "ninth", "tenth", "eleventh", "twelfth",
		"thirteenth", "fourteenth", "fifteenth", "sixteenth", "seventeenth",
		"eighteenth", "nineteenth", "twentieth"
	code_desc = code_desc or "language code"
	if not template_tag then
		template_tag = ""
		if type(template_tag) ~= "string" then
			template_tag = template_tag()
		template_tag = " (Original template: " .. template_tag .. ")"
	local function err(msg)
		error(msg .. template_tag, 3)
	local param_type = type(param)
	local in_the_param
	if param == true then
		-- handled specially below
		in_the_param = ""
		if param_type == "number" then
			param = ordinals[param] .. " parameter"
		elseif param_type == "string" then
			param = 'parameter "' .. param .. '"'
			err("The parameter name is "
					.. (param_type == "table" and "a table" or tostring(param))
					.. ", but it should be a number or a string.")
		in_the_param = " in the " .. param
	if not lang_code or lang_code == "" then
		if param == true then
			err("The " .. code_desc .. " is missing.")
			err("The " .. param .. " (" .. code_desc .. ") is missing.")
	elseif type(lang_code) ~= "string" then
		err("The " .. code_desc .. in_the_param .. " is supposed to be a string but is a " .. type(lang_code) .. ".")
	-- Can use string.find because language codes only contain ASCII.
	elseif not_real_lang or lang_code:find("^%l%l%l?$")
			or lang_code:find("^%l%l%l%-%l%l%l$")
			or lang_code:find("^%l%l%l%-%l%l%l%-%l%l%l$") then
		err("The " .. code_desc .. " \"" .. lang_code .. "\"" .. in_the_param .. " is not valid.")
		err("Please specify a " .. code_desc .. in_the_param .. ". The value \"" .. lang_code .. "\" is not valid.")

local function do_entry_name_or_sort_key_replacements(text, replacements)
	if replacements.from then
		for i, from in ipairs(replacements.from) do
			local to =[i] or ""
			text = mw.ustring.gsub(text, from, to)
	if replacements.remove_diacritics then
		text = mw.ustring.toNFD(text)
		text = mw.ustring.gsub(text,
			'[' .. replacements.remove_diacritics .. ']',
		text = mw.ustring.toNFC(text)
	return text

local Language = {}

function Language:getCode()
	return self._code

function Language:getCanonicalName()
	return self._rawData[1] or self._rawData.canonicalName

function Language:getDisplayForm()
	return self:getCanonicalName()
function Language:getOtherNames(onlyOtherNames)
	return require("Module:language-like").getOtherNames(self, onlyOtherNames)

function Language:getAliases()
	return self._extraData.aliases or {}

function Language:getVarieties(flatten)
	return require("Module:language-like").getVarieties(self, flatten)

function Language:getType()
	return self._rawData.type or "regular"

function Language:getWikimediaLanguages()
	if not self._wikimediaLanguageObjects then
		local m_wikimedia_languages = require("Module:wikimedia languages")
		self._wikimediaLanguageObjects = {}
		local wikimedia_codes = self._rawData.wikimedia_codes or { self._code }
		for _, wlangcode in ipairs(wikimedia_codes) do
			table.insert(self._wikimediaLanguageObjects, m_wikimedia_languages.getByCode(wlangcode))
	return self._wikimediaLanguageObjects

function Language:getWikipediaArticle()
	if self._rawData.wikipedia_article then
		return self._rawData.wikipedia_article 
	elseif self._wikipedia_article then
		return self._wikipedia_article
	elseif self:getWikidataItem() and mw.wikibase then
		self._wikipedia_article = mw.wikibase.sitelink(self:getWikidataItem(), 'enwiki')
	if not self._wikipedia_article then
		self._wikipedia_article = mw.ustring.gsub(self:getCategoryName(), "Creole language", "Creole")
	return self._wikipedia_article

function Language:makeWikipediaLink()
	return "[[w:" .. self:getWikipediaArticle() .. "|" .. self:getCanonicalName() .. "]]"

function Language:getWikidataItem()
	local item = self._rawData[2]
	if type(item) == "number" then
		return "Q" .. item
		return item

function Language:getScripts()
	if not self._scriptObjects then
		local m_scripts = require("Module:scripts")
		self._scriptObjects = {}
		for _, sc in ipairs(self:getScriptCodes()) do
			table.insert(self._scriptObjects, m_scripts.getByCode(sc))
	return self._scriptObjects

function Language:getScriptCodes()
	return self._rawData.scripts or self._rawData[4] or { "None" }

function Language:getFamily()
	if self._familyObject then
		return self._familyObject
	local family = self._rawData[3] or 
	if family then
		self._familyObject = require("Module:families").getByCode(family)
	return self._familyObject

function Language:getAncestors()
	if not self._ancestorObjects then
		self._ancestorObjects = {}
		if self._rawData.ancestors then
			for _, ancestor in ipairs(self._rawData.ancestors) do
				table.insert(self._ancestorObjects, export.getByCode(ancestor) or require("Module:etymology languages").getByCode(ancestor))
			local fam = self:getFamily()
			local protoLang = fam and fam:getProtoLanguage() or nil
			-- For the case where the current language is the proto-language
			-- of its family, we need to step up a level higher right from the start.
			if protoLang and protoLang:getCode() == self:getCode() then
				fam = fam:getFamily()
				protoLang = fam and fam:getProtoLanguage() or nil
			while not protoLang and not (not fam or fam:getCode() == "qfa-not") do
				fam = fam:getFamily()
				protoLang = fam and fam:getProtoLanguage() or nil
			table.insert(self._ancestorObjects, protoLang)
	return self._ancestorObjects

local function iterateOverAncestorTree(node, func)
	for _, ancestor in ipairs(node:getAncestors()) do
		if ancestor then
			local ret = func(ancestor) or iterateOverAncestorTree(ancestor, func)
			if ret then
				return ret

function Language:getAncestorChain()
	if not self._ancestorChain then
		self._ancestorChain = {}
		local step = #self:getAncestors() == 1 and self:getAncestors()[1] or nil
		while step do
			table.insert(self._ancestorChain, 1, step)
			step = #step:getAncestors() == 1 and step:getAncestors()[1] or nil
	return self._ancestorChain

function Language:hasAncestor(otherlang)
	local function compare(ancestor)
		return ancestor:getCode() == otherlang:getCode()
	return iterateOverAncestorTree(self, compare) or false

function Language:getCategoryName()
	-- No "nyelv" prefix in category names
	local name = self:getCanonicalName():gsub("[Nn]yelv$", "")

	if name == "magyar" then
		return "magyar szótár"
		return name .. "-magyar szótár"

function Language:makeCategoryLink()
	return "[[:Category:" .. self:getCategoryName() .. "|" .. self:getDisplayForm() .. "]]"

function Language:getStandardCharacters()
	return self._rawData.standardChars

function Language:makeEntryName(text)
	text = mw.ustring.match(text, "^[¿¡]?(.-[^%s%p].-)%s*[؟?!;՛՜ ՞ ՟?!︖︕।॥။၊་།]?$") or text
	if self:getCode() == "ar" then
		local U = mw.ustring.char
		local taTwiil = U(0x640)
		local waSla = U(0x671)
		-- diacritics ordinarily removed by entry_name replacements
		local Arabic_diacritics = U(0x64B, 0x64C, 0x64D, 0x64E, 0x64F, 0x650, 0x651, 0x652, 0x670)
		if text == waSla or mw.ustring.find(text, "^" .. taTwiil .. "?[" .. Arabic_diacritics .. "]" .. "$") then
			return text
	if type(self._rawData.entry_name) == "table" then
		text = do_entry_name_or_sort_key_replacements(text, self._rawData.entry_name)
	return text

-- Return true if the language has display processing enabled, i.e. lang:makeDisplayText()
-- does non-trivial processing.
function Language:hasDisplayProcessing()
	return not not self._rawData.display

-- Apply display-text replacements to `text`, if any.
function Language:makeDisplayText(text)
	if type(self._rawData.display) == "table" then
		text = do_entry_name_or_sort_key_replacements(text, self._rawData.display)
	return text
-- Add to data tables?
local has_dotted_undotted_i = {
	["az"] = true,
	["crh"] = true,
	["gag"] = true,
	["kaa"] = true,
	["tt"] = true,
	["tr"] = true,
	["zza"] = true,

function Language:makeSortKey(name, sc)
	if has_dotted_undotted_i[self:getCode()] then
		name = name:gsub("I", "ı")
	name = mw.ustring.lower(name)
	-- Remove initial hyphens and *
	local hyphens_regex = "^[-־ـ*]+(.)"
	name = mw.ustring.gsub(name, hyphens_regex, "%1")
	-- If there are language-specific rules to generate the key, use those
	if type(self._rawData.sort_key) == "table" then
		name = do_entry_name_or_sort_key_replacements(name, self._rawData.sort_key)
	elseif type(self._rawData.sort_key) == "string" then
		name = require("Module:" .. self._rawData.sort_key).makeSortKey(name, self:getCode(), sc and sc:getCode())
	-- Remove parentheses, as long as they are either preceded or followed by something
	name = mw.ustring.gsub(name, "(.)[()]+", "%1")
	name = mw.ustring.gsub(name, "[()]+(.)", "%1")
	if has_dotted_undotted_i[self:getCode()] then
		name = name:gsub("i", "İ")
	return mw.ustring.upper(name)

function Language:overrideManualTranslit()
	if self._rawData.override_translit then
		return true
		return false

function Language:transliterate(text, sc, module_override)
	if not ((module_override or self._rawData.translit_module) and text) then
		return nil
	if module_override then
	return require("Module:" .. (module_override or self._rawData.translit_module)).tr(text, self:getCode(), sc and sc:getCode() or nil)

function Language:hasTranslit()
	return self._rawData.translit_module and true or false

function Language:link_tr()
	return self._rawData.link_tr and true or false

function Language:toJSON()
	local entryNamePatterns = nil
	local entryNameRemoveDiacritics = nil
	if self._rawData.entry_name then
		entryNameRemoveDiacritics = self._rawData.entry_name.remove_diacritics
		if self._rawData.entry_name.from then
			entryNamePatterns = {}
			for i, from in ipairs(self._rawData.entry_name.from) do
				local to =[i] or ""
				table.insert(entryNamePatterns, { from = from, to = to })
	local ret = {
		ancestors = self._rawData.ancestors,
		canonicalName = self:getCanonicalName(),
		categoryName = self:getCategoryName("nocap"),
		code = self._code,
		entryNamePatterns = entryNamePatterns,
		entryNameRemoveDiacritics = entryNameRemoveDiacritics,
		family = self._rawData[3] or,
		otherNames = self:getOtherNames(true),
		aliases = self:getAliases(),
		varieties = self:getVarieties(),
		scripts = self._rawData.scripts or self._rawData[4],
		type = self:getType(),
		wikimediaLanguages = self._rawData.wikimedia_codes,
		wikidataItem = self:getWikidataItem(),
	return require("Module:JSON").toJSON(ret)

-- Do NOT use these methods!
-- All uses should be pre-approved on the talk page!
function Language:getRawData()
	return self._rawData

function Language:getRawExtraData()
	return self._extraData

Language.__index = Language

function export.getDataModuleName(code)
	if code:find("^%l%l$") then
		return "languages/data2"
	elseif code:find("^%l%l%l$") then
		local prefix = code:sub(1, 1)
		return "languages/data3/" .. prefix
	elseif code:find("^[%l-]+$") then
		return "languages/datax"
		return nil

function export.getExtraDataModuleName(code)
	if code:find("^%l%l$") then
		return "languages/extradata2"
	elseif code:find("^%l%l%l$") then
		local prefix = code:sub(1, 1)
		return "languages/extradata3/" .. prefix
	elseif code:find("^[%l-]+$") then
		return "languages/extradatax"
		return nil
local function getRawLanguageData(code)
	local modulename = export.getDataModuleName(code)
	return modulename and mw.loadData("Module:" .. modulename)[code] or nil

local function getRawExtraLanguageData(code)
	local modulename = export.getExtraDataModuleName(code)
	return modulename and mw.loadData("Module:" .. modulename)[code] or nil

function Language:loadInExtraData()
	if not self._extraData then
		-- load extra data from module and assign to meta table
		-- use empty table as a fallback if extra data is nil
		local meta = getmetatable(self)
		meta._extraData = getRawExtraLanguageData(self._code) or {}
		setmetatable(self, meta)

function export.makeObject(code, data)
	if data and data.deprecated then
		require("Module:debug").track {
			"languages/deprecated/" .. code
	return data and setmetatable({ _rawData = data, _code = code }, Language) or nil

function export.getByCode(code, paramForError, allowEtymLang, allowFamily)
	if type(code) ~= "string" then
		error("The function getByCode expects a string as its first argument, but received " .. (code == nil and "nil" or "a " .. type(code)) .. ".")
	local retval = export.makeObject(code, getRawLanguageData(code))
	if not retval and allowEtymLang then
		retval = require("Module:etymology languages").getByCode(code)
	if not retval and allowFamily then
		retval = require("Module:families").getByCode(code)
	if not retval and paramForError then
		local codetext = nil
		if allowEtymLang and allowFamily then
			codetext = "language, etymology language or family code"
		elseif allowEtymLang then
			codetext = "language or etymology language code"
		elseif allowFamily then
			codetext = "language or family code"
			codetext = "language code"
		export.err(code, paramForError, codetext)
	return retval

function export.getByName(name, errorIfInvalid)
	local byName = mw.loadData("Module:languages/by name")
	local code = byName.all and byName.all[name] or byName[name]
	if not code then
		if errorIfInvalid then
			error("The language name \"" .. name .. "\" is not valid.")
			return nil
	return export.makeObject(code, getRawLanguageData(code))

function export.getByCanonicalName(name, errorIfInvalid, allowEtymLang, allowFamily)
	local byName = mw.loadData("Module:languages/canonical names")
	local code = byName and byName[name]

	local retval = code and export.makeObject(code, getRawLanguageData(code)) or nil
	if not retval and allowEtymLang then
		retval = require("Module:etymology languages").getByCanonicalName(name)
	if not retval and allowFamily then
		local famname = name:match("^(.*) languages$")
		famname = famname or name
		retval = require("Module:families").getByCanonicalName(famname)
	if not retval and errorIfInvalid then
		local text
		if allowEtymLang and allowFamily then
			text = "language, etymology language or family name"
		elseif allowEtymLang then
			text = "language or etymology language name"
		elseif allowFamily then
			text = "language or family name"
			text = "language name"
		error("The " .. text .. " \"" .. name .. "\" is not valid.")
	return retval

function export.iterateAll()
	local m_data = mw.loadData("Module:languages/alldata")
	local func, t, var = pairs(m_data)
	return function()
		local code, data = func(t, var)
		return export.makeObject(code, data)

--[[	If language is an etymology language, iterates through parent languages
		until it finds a non-etymology language. ]]
function export.getNonEtymological(lang)
	while lang:getType() == "etymology language" do
		local parentCode = lang:getParentCode()
		lang = export.getByCode(parentCode)
			or require("Module:etymology languages").getByCode(parentCode)
			or require("Module:families").getByCode(parentCode)
	return lang

return export