Modul:headword
This module is used to show headword lines, along with any annotations like genders, transliterations and inflections. It's used by the template {{head}}
, via the submodule Module:headword/templates. It's also used by many other headword modules; for a full list, see Kategória:Címszómodulok. Some of the data used by this module is found in Module:headword/data.
full_headword
szerkesztésThis is the primary external entry point. NOTE: The values passed in below will be destructively modified. You are warned.
full_headword(data)
This is used by {{head}}
and various language-specific headword templates (e.g. {{ru-adj}}
for Russian adjectives, {{de-noun}}
for German nouns, etc.) to display an entire headword line.
The sole argument, data
, is a table containing the following items (WARNING: they will be destructively modified):
{
lang = language_object,
sc = script_object,
heads = { [1] = "head1", [2] = "head2", ... },
translits = { [1] = "translit1", [2] = "translit2", ... },
inflections = {
{ label = "grammatical_category", "inflected_form" },
...
},
genders = { "gender1", "gender2", ... },
pos_category = "plural_part_of_speech",
categories = { "category1", "category2", ... },
sort_key = "sort_key",
}
Further explanation:
data.lang
is required and is a Language object from Module:languages corresponding to a given language. For example, userequire("Module:languages").getByCode("ru")
to retrieve the object corresponding to Russian.data.sc
is a script object from Module:scripts corresponding to a given script. Most of the time you can omit this item, and Module:scripts will determine the script using the list of scripts in the language's data file.data.heads
is a table listing the heads of the headword, each of which is a string. An empty string means to use a default head based on the page name. It is also possible to pass in a single string for a single headword, or omit it entirely, which is equivalent to passing in a single empty string (i.e. only one head, based on the page name).data.translits
is a table listing the transliterations corresponding to each headword indata.heads
. The Nth numbered entry should be either a string specifying the transliteration of headword N, or may be omitted to display no transliteration or to generate an automatic transliteration using the language's transliteration module. (For languages with a transliteration module, pass in"-"
to suppress the transliteration entirely.) It is also possible to pass in a single string (equivalent to a one-element list) or omitted entirely (equivalent to an empty list). Note that, if there are multiple headwords, the table indata.translits
might have entries in the middle of the list that arenil
. A list of this sort cannot be created withtable.insert()
, as attempting to insertnil
this way does nothing. Instead, each transliteration must be explicitly assigned using a number as index:{ [1] = "string", [3] = "string", ... }
. (Here, item2
isnil
, because no value was assigned to it.)data.genders
is a table listing the gender or number strings for the headwords. This can be omitted for no genders or numbers. The accepted values for genders or numbers are given in Module:gender and number. Seeformat_genders
below for an example of this argument.data.inflections
is a table listing the inflections to be displayed in the headword entry. The format of this table is somewhat complex and is described below underformat_inflections
.data.pos_category
is the part-of-speech category for the entry. This is one of thelemma
andnonlemma
parts of speech listed in Module:headword/data. It should be in the plural: for example,"nouns"
. If this item is omitted, the part of speech category must be included in as the first item indata.categories
.data.categories
is a table listing the categories to which the entry containing the headword will be added. The first category should be a part-of-speech category, with the canonical name of the language at the beginning –"Russian nouns"
– unless the part of speech is given in the fielddata.pos_category
.data.sort_key
is a string specifying a sort key for the categories listed indata.categories
. Sort keys should usually be omitted, because theformat_categories
function in Module:utilities will generate a suitable sortkey in most cases. The sortkey is used to ensure that the page is listed in the correct order in the categories to which it belongs.
Examples
szerkesztésA simple example
szerkesztésfull_headword{
lang = require("Module:languages").getByCode("en"), -- language code
heads = {"book"}, -- headwords
inflections = {
{label = "plural", "books"} -- inflections
},
categories = {"English nouns"}, -- part-of-speech category
}
might give (depending on the page it's run on):
<strong class="Latn headword" lang="en">book</strong> (''plural'' <b class="Latn" lang="en">[[books#English|books]]</b>)[[Category:English lemmas|HEADWORD]][[Category:English nouns|HEADWORD]]
which displays as:
- book (plural books)
A fuller example
szerkesztésfull_headword{
lang = require("Module:languages").getByCode("de"),
heads = {"Hund"},
genders = {"m"},
inflections = {
{label = "genitive", "Hundes", "Hunds"},
{label = "diminutive",
{term = "Hündchen", genders = {"n"}},
{nolink=true, term = "Hündlein", genders = {"n"}}
}
},
categories = {"German nouns"},
}
might give (depending on the page it's run on):
<strong class="Latn headword" lang="de">Hund</strong> <span class="gender"><abbr title="masculine gender">m</abbr></span> (''genitive'' <b class="Latn" lang="de">[[Hundes#German|Hundes]]</b> ''or'' <b class="Latn" lang="de">[[Hunds#German|Hunds]]</b>, ''plural'' <b class="Latn" lang="de">[[Hunde#German|Hunde]] </b>''or (nonstandard)''<b> [[Hünde#German|Hünde]]</b>, ''diminutive'' <b class="Latn" lang="de">[[Hündchen#German|Hündchen]]</b> <span class="gender"><abbr title="neuter gender">n</abbr></span> ''or'' <b class="Latn" lang="de">Hündlein</b> <span class="gender"><abbr title="neuter gender">n</abbr></span>)[[Category:German lemmas|HEADWORD]][[Category:German nouns|HEADWORD]]
which displays as:
- Hund m (genitive Hundes or Hunds, plural Hunde or (nonstandard) Hünde, diminutive Hündchen n or Hündlein n)
An example in a non-Latin script
szerkesztésThis example is in Russian, which has automatic transliteration:
full_headword{
lang = require("Module:languages").getByCode("ru"),
heads = {"кни́га"},
genders = {"f-in"},
inflections = {
{label = "genitive", "кни́ги"},
{label = "nominative plural", "кни́ги"},
{label = "genitive plural", "книг"}
},
categories = {"Russian nouns"},
}
might give (depending on the page it's run on):
<strong class="Cyrl headword" lang="ru">кни́га</strong> [[Wiktionary:Russian transliteration|•]] (<span class="tr" lang=""><span class="tr" lang="">kníga</span></span>) <span class="gender"><abbr title="feminine gender">f</abbr> <abbr title="inanimate">inan</abbr></span> (''genitive'' <b class="Cyrl" lang="ru">[[книги#Russian|кни́ги]]</b>, ''nominative plural'' <b class="Cyrl" lang="ru">[[книги#Russian|кни́ги]]</b>, ''genitive plural'' <b class="Cyrl" lang="ru">[[книг#Russian|книг]]</b>)[[Category:Russian lemmas|HEADWORD]][[Category:Russian nouns|HEADWORD]]
which displays as
A fuller example in a non-Latin script
szerkesztésThis example is in Russian, with two headwords, each of which requires manual transliteration:
full_headword{
lang = require("Module:languages").getByCode("ru"),
heads = {"интервьюе́р", "интервью́ер"},
translits = {"intɛrvʹjuér", "intɛrvʹjújer"},
genders = {"m-an"},
inflections = {
{label = "genitive", "интервьюе́ра", "интервью́ера"},
{label = "nominative plural", "интервьюе́ры", "интервью́еры"},
{label = "genitive plural", "интервьюе́ров", "интервью́еров"},
},
categories = {"Russian nouns"},
}
might give (depending on the page it's run on):
<strong class="Cyrl headword" lang="ru">интервьюе́р</strong> ''or'' <strong class="Cyrl headword" lang="ru">интервью́ер</strong> [[Wiktionary:Russian transliteration|•]] (<span class="tr" lang=""><span class="tr" lang="">intɛrvʹjuér</span> ''or'' <span class="tr" lang="">intɛrvʹjújer</span></span>) <span class="gender"><abbr title="masculine gender">m</abbr> <abbr title="animate">anim</abbr></span> (''genitive'' <b class="Cyrl" lang="ru">[[интервьюера#Russian|интервьюе́ра]]</b> ''or'' <b class="Cyrl" lang="ru">[[интервьюера#Russian|интервью́ера]]</b>, ''nominative plural'' <b class="Cyrl" lang="ru">[[интервьюеры#Russian|интервьюе́ры]]</b> ''or'' <b class="Cyrl" lang="ru">[[интервьюеры#Russian|интервью́еры]]</b>, ''genitive plural'' <b class="Cyrl" lang="ru">[[интервьюеров#Russian|интервьюе́ров]]</b> ''or'' <b class="Cyrl" lang="ru">[[интервьюеров#Russian|интервью́еров]]</b>)[[Category:Russian lemmas|HEADWORD]][[Category:Russian nouns|HEADWORD]]
which displays as
- интервьюе́р or интервью́ер • (intɛrvʹjuér or intɛrvʹjújer) m anim (genitive интервьюе́ра or интервью́ера, nominative plural интервьюе́ры or интервью́еры, genitive plural интервьюе́ров or интервью́еров)
Another fuller example in a non-Latin script
szerkesztésThis example is in Arabic, with embedded links in the headword and manual transliteration in an inflection (note that Arabic also has automatic transliteration, and is one of the languages that will display automatic transliterations of inflections in the headword, unlike e.g. Russian):
full_headword{
lang = require("Module:languages").getByCode("ar"),
heads = {"[[غُدّة]] [[بَصَلِيّ|بَصَلِيّة]] [[إحْلِيلِيّ|إحْلِيلِيّة]]"},
translits = {"ḡudda baṣaliyya ʾiḥlīliyya"},
genders = {"f"},
inflections = {
{label = "plural", {term="غُدَد بَصَلِيَّة إِحْلِيلِيَة", translit="ḡudad baṣaliyya ʾiḥlīliyya"}},
},
categories = {"Arabic nouns"},
}
might give (depending on the page it's run on):
<strong class="Arab headword" lang="ar">[[غدة#Arabic|غُدّة]] [[بصلي#Arabic|بَصَلِيّة]] [[إحليلي#Arabic|إحْلِيلِيّة]]</strong> [[Wiktionary:Arabic transliteration|•]] (<span class="tr" lang=""><span class="tr" lang="">ḡudda baṣaliyya ʾiḥlīliyya</span></span>) <span class="gender"><abbr title="feminine gender">f</abbr></span> (''plural'' <b class="Arab" lang="ar">[[غدد بصلية إحليلية#Arabic|غُدَد بَصَلِيَّة إِحْلِيلِيَة]]</b> (<span lang="" class="tr">ḡudad baṣaliyya ʾiḥlīliyya</span>))[[Category:Arabic lemmas|HEADWORD]][[Category:Arabic nouns|HEADWORD]]
which displays as
- غُدّة بَصَلِيّة إحْلِيلِيّة • (ḡudda baṣaliyya ʾiḥlīliyya) f (plural غُدَد بَصَلِيَّة إِحْلِيلِيَة (ḡudad baṣaliyya ʾiḥlīliyya))
format_headword
szerkesztésSablon:documentation outdated
format_headword(data)
Formats a headword, using the format appropriate for the language object and script (see Module:script utilities#tag_text
) contained in the data
table.
The sole argument is a table containing the same items as the table supplied to full_headword
. Only a subset of the items in the table are used by this function: the heads
, lang
, sc
, translits
.
The data.heads
parameter can either be a single string or a table of strings. If it's a table, then each string in the table is shown as a headword, separated by "or". This allows you to show multiple alternative headwords, such as when the same written form can be accented in different ways.
It has special behaviour in certain cases as well:
- If an item in the
heads
parameter contains wikilinks, they are converted into language-section links for the given language (usingModule:links#language_link
, which is also used by{{l}}
). For example, giving"[[give]] [[up]]"
, if the language provided is English, will produce:"[[give#English|give]] [[up#English|up]]"
. If string is prefixed with * or if any of the links are, then they are interpreted as reconstructed terms and it will create links to the Reconstruction namespace as appropriate. - If
heads
is empty (nil
or the empty table), it will default to the subpage name (mw.title.getCurrentTitle().subpageText
, equivalent to the magic word{{SUBPAGENAME}}
).- If the page name contains spaces or punctuation marks (except for punctuation marks that are used inside of words), it is split and each individual word is automatically wikilinked as above.
- If the current page is in the appendix namespace, and the language's type (in Module:languages) is not
"appendix-constructed"
, then an asterisk"*"
will be prepended to the headword to indicate that it is a reconstructed term.
format_transliteration
szerkesztésSablon:documentation outdated
format_transliteration(tr, lang)
If the transliteration is specified and non-empty, adds some stuff before and after it. For example, if the transliteration is "foo"
and the language is Hebrew, produces
[[Wiktionary:Hebrew transliteration|•]] (<span lang="">foo</span>)
which looks like “• (foo)”.
(Note: the bullet linking to a transliteration policy page is only added if the page actually exists.)
format_genders
szerkesztésNOTE: This documentation is up-to-date. Keep in mind, however, that this function is not currently exported, and the contents of the argument data
will be overwritten.
format_genders(data)
Format gender specifications using Module:gender and number. For example:
format_genders({genders = {"m-in", "m-an-p"},
lang=require("Module:languages").getByCode("ru")})
gives:
<span class="gender"><abbr title="masculine gender">m</abbr> <abbr title="inanimate">inan</abbr>, <abbr title="masculine gender">m</abbr> <abbr title="animate">anim</abbr> <abbr title="plural number">pl</abbr></span>
displays as:
- m inan, m anim pl
The argument is a table, consisting of elements .genders
and .lang. NOTE: The table will be overwritten!!!
The value of .genders
is a list of gender/number strings, in the form required by Module:gender and number.
format_inflections
szerkesztésNOTE: This documentation is up-to-date. Keep in mind, however, that this function is not currently exported, and the contents of the argument data
will be overwritten.
format_inflections(data)
Format a list (table) of inflections, which are then concatenated together with commas and surrounded by parentheses. For example:
format_inflections({inflections = {
{label = "diminutive", "Hündchen"}
}, lang=require("Module:languages").getByCode("de")})
gives:
(''diminutive'' <b class="Latn" lang="de">[[Hündchen#German|Hündchen]]</b>)
displays as:
- (diminutive Hündchen)
The argument is a table, consisting of elements .inflections
, .lang, and optionally .sc
. NOTE: The table will be overwritten!!!
The value of .inflections
is a list of labeled inflections, each of which is a table:
- The table must have a
.label
value which contains the label. It is displayed in italics and not linked. - Value of
.enable_auto_translit
may be set if transliteration of inflections is desired (it is off by default). - The table may optionally have a
.accel
value. This value is used to support accelerated entry creation using WT:ACCEL. The "form-of" and "lang-(code)" classes are added automatically, so only the "(form)-form-of" class needs to be given, along with any other classes that may be needed. - Numbered values in the table are the actual forms. They are normally formatted in bold text and converted to a link to the term (but see below). If a term already contains a link, it is converted into a section link using
Module:links#language_link
, just like informat_headword
. - Forms are optional. If the table contains only the
.label
, then just the label is shown with no forms. If there is more than one form, they are shown with "or" between them.
For example:
format_inflections({inflections = {
{label = "present", "krama"},
{label = "past", "kramade"},
{label = "past participle", "kramat"}
}, lang=require("Module:languages").getByCode("sv")})
format_inflections({inflections = {
{label = "plural", accel = "plural-form-of", "voorbeelden"},
}, lang=require("Module:languages").getByCode("nl")})
gives:
(''present'' <b class="Latn" lang="sv">[[krama#Swedish|krama]]</b>, ''past'' <b class="Latn" lang="sv">[[kramade#Swedish|kramade]]</b>, ''past participle'' <b class="Latn" lang="sv">[[kramat#Swedish|kramat]]</b>)
(''plural'' <span class="form-of lang-nl plural-form-of "><b class="Latn" lang="nl">[[voorbeelden#Dutch|voorbeelden]]</b></span>)
displays as:
- (present krama, past kramade, past participle kramat)
- (plural voorbeelden)
It is also possible, but optional, to supply a table instead of a term. This table can contain the keys .term
(the actual term), .alt
(alternative display form), .sc
(script), .translit
(transliteration), .id
(sense id), .genders
(list of genders), .nolink
(if true, the function will not link to the term, but only display it boldfaced), .hypothetical
(if true, the function will not link to the term, but display it italicized and preceded by a *), .accel
(same as .accel
in the outer table but applies only to the given term; if both accelerators are specified, both will appear as CSS classes). Most of these are used the same way as for full_link
in Module:links, and are passed directly to it.
Example:
format_inflections({inflections = {
{label = "diminutive",
{term = "Hündchen", genders = {"n"}},
{nolink=true, term = "Hündlein", genders = {"n"}}
}}, lang=require("Module:languages").getByCode("de")})
gives:
(''diminutive'' <b class="Latn" lang="de">[[Hündchen#German|Hündchen]]</b> <span class="gender"><abbr title="neuter gender">n</abbr></span> ''or'' <b class="Latn" lang="de">Hündlein</b> <span class="gender"><abbr title="neuter gender">n</abbr></span>)
displays as:
- (diminutive Hündchen n or Hündlein n)
Proposed/planned changes
szerkesztés- Checking for invalid genders, given a list of genders that are valid for a particular language.
local export = {}
local m_data = mw.loadData("Module:headword/data")
local title = mw.title.getCurrentTitle()
local isLemma = m_data.lemmas
local isNonLemma = m_data.nonlemmas
local notranslit = m_data.notranslit
local toBeTagged = m_data.toBeTagged
-- If set to true, categories always appear, even in non-mainspace pages
local test_force_categories = false
local function test_script(text, script_code)
if type(text) == "string" and type(script_code) == "string" then
local sc = require("Module:scripts").getByCode(script_code)
local characters
if sc then
characters = sc:getCharacters()
end
local out
if characters then
text = mw.ustring.gsub(text, "%W", "")
out = mw.ustring.find(text, "[" .. characters .. "]")
end
if out then
return true
else
return false
end
else
mw.log("Parameters to test_script were incorrect.")
return nil
end
end
local spacingPunctuation = "[%s%p]+"
--[[ List of punctuation or spacing characters that are found inside of words.
Used to exclude characters from the regex above. ]]
local wordPunc = "־׳״.·*་•"
local notWordPunc = "[^" .. wordPunc .. "]+"
-- Return true if the given head is multiword according to the algorithm used
-- in full_headword().
function export.head_is_multiword(head)
for possibleWordBreak in mw.ustring.gmatch(head, spacingPunctuation) do
if mw.ustring.find(possibleWordBreak, notWordPunc) then
return true
end
end
return false
end
-- Add links to a multiword head.
function export.add_multiword_links(head)
local function workaround_to_exclude_chars(s)
return mw.ustring.gsub(s, notWordPunc, "]]%1[[")
end
head = "[["
.. mw.ustring.gsub(
head,
spacingPunctuation,
workaround_to_exclude_chars
)
.. "]]"
--[=[
use this when workaround is no longer needed:
head = "[["
.. mw.ustring.gsub(head, WORDBREAKCHARS, "]]%1[[")
.. "]]"
Remove any empty links, which could have been created above
at the beginning or end of the string.
]=]
head = mw.ustring.gsub(head, "%[%[%]%]", "")
return head
end
local function non_categorizable()
return (title:inNamespace("") and title.text:find("^Unsupported titles/"))
-- or (title:inNamespace("Appendix") and title.text:find("^Gestures/"))
end
local function preprocess(data, postype)
--[=[
[[Special:WhatLinksHere/Template:tracking/headword/heads-not-table]]
[[Special:WhatLinksHere/Template:tracking/headword/translits-not-table]]
]=]
if type(data.heads) ~= "table" then
if data.heads then
require("Module:debug/track")("headword/heads-not-table")
end
data.heads = { data.heads }
end
if type(data.translits) ~= "table" then
if data.translits then
require("Module:debug/track")("headword/translits-not-table")
end
data.translits = { data.translits }
end
if type(data.transcriptions) ~= "table" then
if data.transcriptions then
require("Module:debug/track")("headword/transcriptions-not-table")
end
data.transcriptions = { data.transcriptions }
end
if not data.heads or #data.heads == 0 then
data.heads = {""}
end
-- Determine if term is reconstructed
local is_reconstructed = data.lang:getType() == "reconstructed"
or title.nsText == "Reconstruction"
-- Create a default headword.
local subpagename = title.subpageText
local pagename = title.text
local default_head
if is_reconstructed then
default_head = require("Module:utilities").plain_gsub(pagename, data.lang:getCanonicalName() .. "/", "")
else
default_head = subpagename
end
local unmodified_default_head = default_head
-- Add links to multi-word page names when appropriate
if data.lang:getCode() ~= "zh" and (not is_reconstructed) and
export.head_is_multiword(default_head) then
default_head = export.add_multiword_links(default_head)
end
if is_reconstructed then
default_head = "*" .. default_head
end
-- If a head is the empty string "", then replace it with the default
for i, head in ipairs(data.heads) do
if head == "" then
head = default_head
else
if head == default_head and data.lang:getCanonicalName() == "English" then
--table.insert(data.categories, data.lang:getCanonicalName() .. " terms with redundant head parameter")
end
end
data.heads[i] = head
end
-- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms"
if not data.nomultiwordcat and postype == "lemmá" and not m_data.no_multiword_cat[data.lang:getCode()] then
-- Check for spaces or hyphens, but exclude prefixes and suffixes.
-- Use the pagename, not the head= value, because the latter may have extra
-- junk in it, e.g. superscripted text that throws off the algorithm.
local checkpattern = ".[%s%-]."
if m_data.hyphen_not_multiword_sep[data.lang:getCode()] then
-- Exclude hyphens if the data module states that they should for this language
checkpattern = ".[%s]."
end
if mw.ustring.find(unmodified_default_head, checkpattern) and not non_categorizable() then
table.insert(data.categories, data.lang:getCanonicalName() .. " kifejezések")
end
end
--[[ Try to detect the script if it was not provided
We use the first headword for this, and assume
that all of them have the same script
This *should* always be true, right? ]]
if not data.sc then
data.sc = require("Module:scripts").findBestScript(data.heads[1], data.lang)
end
for i, val in pairs(data.translits) do
data.translits[i] = {display = val, is_manual = true}
end
-- Make transliterations
for i, head in ipairs(data.heads) do
local translit = data.translits[i]
-- Try to generate a transliteration if necessary
-- Generate it if the script is not Latn or similar, and if no transliteration was provided
if translit and translit.display == "-" then
translit = nil
elseif not translit and not (data.sc:getCode():find("Latn", nil, true) or data.sc:getCode() == "Latinx" or data.sc:getCode() == "None") and (not data.sc or data.sc:getCode() ~= "Imag") then
translit = data.lang:transliterate(require("Module:links").remove_links(head), data.sc)
-- There is still no transliteration?
-- Add the entry to a cleanup category.
if not translit and not notranslit[data.lang:getCode()] then
translit = "<small>nincs latin átírás</small>"
table.insert(data.categories, data.lang:getCanonicalName() .. " - nincs átírás")
end
if translit then
translit = {display = translit, is_manual = false}
end
end
-- Link to the transliteration entry for languages that require this
if translit and data.lang:link_tr() then
translit.display = require("Module:links").full_link{
term = translit.display,
lang = data.lang,
sc = require("Module:scripts").getByCode("Latn"),
tr = "-"
}
end
data.translits[i] = translit
end
if data.id and type(data.id) ~= "string" then
error("The id in the data table should be a string.")
end
end
-- Format a headword with transliterations
local function format_headword(data)
local m_links = require("Module:links")
local m_scriptutils = require("Module:script utilities")
-- Are there non-empty transliterations?
-- Need to do it this way because translit[1] might be nil while translit[2] is not
local has_translits = false
local has_manual_translits = false
-- Format the headwords
for i, head in ipairs(data.heads) do
if data.translits[i] or data.transcriptions[i] then
has_translits = true
end
if data.translits[i] and data.translits[i].is_manual or data.transcriptions[i] then
has_manual_translits = true
end
-- Apply processing to the headword, for formatting links and such
if head:find("[[", nil, true) and (not data.sc or data.sc:getCode() ~= "Imag") then
head = m_links.language_link({term = head, lang = data.lang}, false)
end
-- Add language and script wrapper
if i == 1 then
head = m_scriptutils.tag_text(head, data.lang, data.sc, "head", nil, data.id)
else
head = m_scriptutils.tag_text(head, data.lang, data.sc, "head", nil)
end
data.heads[i] = head
end
local translits_formatted = ""
if has_manual_translits then
-- [[Special:WhatLinksHere/Template:tracking/headword/has-manual-translit/LANG]]
require("Module:debug/track")("headword/has-manual-translit/" .. data.lang:getCode())
end
if has_translits then
-- Format the transliterations
local translits = data.translits
local transcriptions = data.transcriptions
if translits then
-- using pairs() instead of ipairs() in case there is a gap
for i, _ in pairs(translits) do
if type(i) == "number" then
translits[i] = m_scriptutils.tag_translit(translits[i].display, data.lang:getCode(), "head", nil, translits[i].is_manual)
end
end
end
if transcriptions then
for i, _ in pairs(transcriptions) do
if type(i) == "number" then
transcriptions[i] = m_scriptutils.tag_transcription(transcriptions[i], data.lang:getCode(), "head")
end
end
end
for i = 1, math.max(#translits, #transcriptions) do
local translits_formatted = {}
table.insert(translits_formatted, translits[i] and translits[i] or "")
table.insert(translits_formatted, (translits[i] and transcriptions[i]) and " " or "")
table.insert(translits_formatted, transcriptions[i] and "/" .. transcriptions[i] .. "/" or "")
data.translits[i] = table.concat(translits_formatted)
end
translits_formatted = " (" .. table.concat(data.translits, " <i>vagy</i> ") .. ")"
local transliteration_page = mw.title.new(data.lang:getCanonicalName() .. " átírás", "Wikiszótár")
if transliteration_page then
local success, exists = pcall(function () return transliteration_page.exists end)
if success and exists then
translits_formatted = " [[Wikiszótár:" .. data.lang:getCanonicalName() .. " átírás|•]]" .. translits_formatted
end
end
end
return table.concat(data.heads, " <i>vagy</i> ") .. translits_formatted
end
local function format_genders(data)
if data.genders and #data.genders > 0 then
local pos_for_cat
if not data.nogendercat and not m_data.no_gender_cat[data.lang:getCode()] then
local pos_category = data.pos_category:gsub("^reconstructed ", "")
pos_for_cat = m_data.pos_for_gender_number_cat[pos_category]
end
local gen = require("Module:gender and number")
local text, cats = gen.format_genders(data.genders, data.lang, pos_for_cat)
for _, cat in ipairs(cats) do
table.insert(data.categories, cat)
end
return " " .. text
else
return ""
end
end
local function format_inflection_parts(data, parts)
local m_links = require("Module:links")
for key, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local qualifiers
local reftext
if part.qualifiers and #part.qualifiers > 0 then
qualifiers = require("Module:qualifier").format_qualifier(part.qualifiers) .. " "
-- [[Special:WhatLinksHere/Template:tracking/headword/qualifier]]
require("Module:debug/track")("headword/qualifier")
end
if part.refs and #part.refs > 0 then
local refs = {}
for _, ref in ipairs(part.refs) do
if type(ref) ~= "table" then
ref = {text = ref}
end
local refargs
if ref.name or ref.group then
refargs = {name = ref.name, group = ref.group}
end
table.insert(refs, mw.getCurrentFrame():extensionTag("ref", ref.text, refargs))
end
reftext = table.concat(refs)
end
local partaccel = part.accel
local face = part.hypothetical and "hypothetical" or "bold"
local nolink = part.hypothetical or part.nolink
-- Convert the term into a full link
-- Don't show a transliteration here, the consensus seems to be not to
-- show them in headword lines to avoid clutter.
part = m_links.full_link(
{
term = not nolink and part.term or nil,
alt = part.alt or (nolink and part.term or nil),
lang = part.lang or data.lang,
sc = part.sc or parts.sc or (not part.lang and data.sc),
id = part.id,
genders = part.genders,
tr = part.translit or (not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil),
ts = part.transcription,
accel = parts.accel or partaccel,
},
face,
false
)
if qualifiers then
part = qualifiers .. part
end
if reftext then
part = part .. reftext
end
parts[key] = part
end
local parts_output = ""
if #parts > 0 then
parts_output = " " .. table.concat(parts, " <i>vagy</i> ")
elseif parts.request then
parts_output = " <small>[nincs megadva]</small>"
.. require("Module:utilities").format_categories(
{data.lang:getCanonicalName() .. " szavak hiányzó ragozással"},
lang,
nil,
nil,
data.force_cat_output or test_force_categories,
data.sc
)
end
return "<i>" .. parts.label .. "</i>" .. parts_output
end
-- Format the inflections following the headword
local function format_inflections(data)
if data.inflections and #data.inflections > 0 then
-- Format each inflection individually
for key, infl in ipairs(data.inflections) do
data.inflections[key] = format_inflection_parts(data, infl)
end
return " (" .. table.concat(data.inflections, ", ") .. ")"
else
return ""
end
end
-- Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil
-- if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.).
-- If you have a POS in its singular form, call pluralize() in [[Module:string utilities]] to
-- pluralize it in a smart fashion that knows when to add '-s' and when to add '-es'.
--
-- If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess
-- based on whether it ends in " forms"; otherwise, return nil.
function export.pos_lemma_or_nonlemma(plpos, best_guess)
-- Is it a lemma category?
if isLemma[plpos] or isLemma[plpos:gsub("^reconstructed ", "")] then
return "lemmá"
-- Is it a nonlemma category?
elseif isNonLemma[plpos]
or isNonLemma[plpos:gsub("^reconstructed ", "")]
or isLemma[plpos:gsub("^mutated ", "")]
or isNonLemma[plpos:gsub("^mutated ", "")] then
return "ragozott alako"
elseif best_guess then
return plpos:find(" forms$") and "ragozott alako" or "lemmá"
else
return nil
end
end
local function show_headword_line(data)
local namespace = title.nsText
-- Check the namespace against the language type
if namespace == "" then
if data.lang:getType() == "reconstructed" then
error("Entries for this language must be placed in the Reconstruction: namespace.")
elseif data.lang:getType() == "appendix-constructed" then
error("Entries for this language must be placed in the Appendix: namespace.")
end
end
local tracking_categories = {}
if not data.noposcat then
local pos_category = data.lang:getCanonicalName() .. " " .. data.pos_category
if pos_category ~= "Translingual Han characters" then
table.insert(data.categories, 1, pos_category)
end
end
if data.sccat and data.sc then
table.insert(data.categories, data.lang:getCanonicalName() .. " " .. data.pos_category
.. " in " .. data.sc:getDisplayForm())
end
-- Is it a lemma category?
local postype = export.pos_lemma_or_nonlemma(data.pos_category)
if not postype then
-- We don't know what this category is, so tag it with a tracking category.
--[=[
[[Special:WhatLinksHere/Template:tracking/headword/unrecognized pos]]
]=]
--table.insert(tracking_categories, "head tracking/unrecognized pos")
require("Module:debug").track{
"headword/unrecognized pos",
"headword/unrecognized pos/lang/" .. data.lang:getCode(),
"headword/unrecognized pos/pos/" .. data.pos_category
}
elseif not data.noposcat then
table.insert(data.categories, 1, data.lang:getCanonicalName() .. " " .. postype .. "k")
end
-- Preprocess
preprocess(data, postype)
local m_links = require("Module:links")
if namespace == "" and data.lang:getType() ~= "reconstructed" then
for _, head in ipairs(data.heads) do
if title.prefixedText ~= m_links.getLinkPage(m_links.remove_links(head), data.lang) then
--[=[
[[Special:WhatLinksHere/Template:tracking/headword/pagename spelling mismatch]]
]=]
require("Module:debug").track{
"headword/pagename spelling mismatch",
"headword/pagename spelling mismatch/" .. data.lang:getCode()
}
break
end
end
end
-- Format and return all the gathered information
return
format_headword(data) ..
format_genders(data) ..
format_inflections(data) ..
require("Module:utilities").format_categories(
tracking_categories, data.lang, data.sort_key, nil,
data.force_cat_output or test_force_categories, data.sc
)
end
function export.full_headword(data)
local tracking_categories = {}
-- Script-tags the topmost header.
local pagename = title.text
local fullPagename = title.fullText
local namespace = title.nsText
if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then
error("In data, the first argument to full_headword, data.lang should be a language object.")
end
if not data.sc then
data.sc = require("Module:scripts").findBestScript(data.heads and data.heads[1] ~= "" and data.heads[1] or pagename, data.lang)
else
-- Track uses of sc parameter
local best = require("Module:scripts").findBestScript(pagename, data.lang)
require("Module:debug/track")("headword/sc")
if data.sc:getCode() == best:getCode() then
require("Module:debug/track")("headword/sc/redundant")
require("Module:debug/track")("headword/sc/redundant/" .. data.sc:getCode())
else
require("Module:debug/track")("headword/sc/needed")
require("Module:debug/track")("headword/sc/needed/" .. data.sc:getCode())
end
end
local displayTitle
-- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace.
-- Avoid tagging ASCII as Hani even when it is tagged as Hani in the
-- headword, as in [[check]]. The check for ASCII might need to be expanded
-- to a check for any Latin characters and whitespace or punctuation.
if (namespace == "" and data.sc and toBeTagged[data.sc:getCode()]
and not pagename:find "^[%z\1-\127]+$")
or (data.sc:getCode() == "Jpan" and (test_script(pagename, "Hira") or test_script(pagename, "Kana"))) then
-- displayTitle = '<span class="' .. data.sc:getCode() .. '">' .. pagename .. '</span>'
elseif namespace == "Reconstruction" then
displayTitle, matched = mw.ustring.gsub(
fullPagename,
"^(Reconstruction:[^/]+/)(.+)$",
function(before, term)
return before ..
require("Module:script utilities").tag_text(
term,
data.lang,
data.sc
)
end
)
if matched == 0 then
displayTitle = nil
end
end
if displayTitle then
local frame = mw.getCurrentFrame()
frame:callParserFunction(
"DISPLAYTITLE",
displayTitle
)
end
if data.force_cat_output then
--[=[
[[Special:WhatLinksHere/Template:tracking/headword/force cat output]]
]=]
require("Module:debug/track")("headword/force cat output")
end
if data.getCanonicalName then
error('The "data" variable supplied to "full_headword" should not be a language object.')
end
-- Were any categories specified?
if data.categories and #data.categories > 0 then
local lang_name = require("Module:string").pattern_escape(data.lang:getCanonicalName())
for _, cat in ipairs(data.categories) do
-- Does the category begin with the language name? If not, tag it with a tracking category.
if not mw.ustring.find(cat, "^" .. lang_name) then
mw.log(cat, data.lang:getCanonicalName())
--table.insert(tracking_categories, "head tracking/no lang category")
--[=[
[[Special:WhatLinksHere/Template:tracking/head tracking/no lang category]]
]=]
require("Module:debug").track{
"headword/no lang category",
"headword/no lang category/lang/" .. data.lang:getCode()
}
end
end
if not data.pos_category
and mw.ustring.find(data.categories[1], "^" .. data.lang:getCanonicalName())
then
data.pos_category = mw.ustring.gsub(data.categories[1], "^" .. data.lang:getCanonicalName() .. " ", "")
table.remove(data.categories, 1)
end
end
if not data.pos_category then
error(
'No valid part-of-speech categories were found in the list '
.. 'of categories passed to the function "full_headword". '
.. 'The part-of-speech category should consist of a language\'s '
.. 'canonical name plus a part of speech.'
)
end
--[[
-- Categorise for unusual characters
local standard = data.lang:getStandardCharacters()
if standard then
if mw.ustring.len(title.subpageText) ~= 1 and not non_categorizable() then
for character in mw.ustring.gmatch(title.subpageText, "([^" .. standard .. "])") do
local upper = mw.ustring.upper(character)
if not mw.ustring.find(upper, "[" .. standard .. "]") then
character = upper
end
table.insert(
data.categories,
data.lang:getCanonicalName() .. " terms spelled with " .. character
)
end
end
end
]]
-- Categorise for palindromes
if title.nsText ~= "Reconstruction" and mw.ustring.len(title.subpageText)>2
and require('Module:palindromes').is_palindrome(
title.subpageText, data.lang, data.sc
) then
table.insert(data.categories, data.lang:getCanonicalName() .. " palindromok")
end
-- This may add more categories (e.g. gender categories), so make sure it gets
-- evaluated first.
local text = show_headword_line(data)
return
text ..
require("Module:utilities").format_categories(
data.categories, data.lang, data.sort_key, nil,
data.force_cat_output or test_force_categories, data.sc
) ..
require("Module:utilities").format_categories(
tracking_categories, data.lang, data.sort_key, nil,
data.force_cat_output or test_force_categories, data.sc
)
end
return export