- 1 JMdict/EDICT Editorial Policy and Guidelines
- 2 Before Starting
- 3 Dictionary Entry Fields
- 4 Other Issues/Policies
JMdict/EDICT Editorial Policy and Guidelines
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the JMdictDB on-line database system.
Before proposing a new entry or an amendment, you should:
- familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;
- make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. if it is a variant, add it to the existing entry. Check such things as:
- common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;
- common okurigana variants, e.g. 生花/生け花;
- modern and old kanji, e.g. 合気道/合氣道
- check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?
- verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the Goo site. The Eijiro dictionary at the ALC site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information should be included in the "Reference" section in the form.
- verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.
- decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters to dictionary to include them. (See the section below.)
Dictionary Entry Fields
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. ＭＰ３プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:
- alternative kanji in the word, e.g. 合気道 and 合氣道
- variations in okurigana, e.g., 生け花 and 生花
- part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use.
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.
Some other points to note:
- in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".
- as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.
- for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.
- for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".
- for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.
- for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.
A set of tags, e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.
In this section enter either the reading(s) of the word/phrase in the Kanji section, or the word itself if it is written inly in kana. Readings associated with kanji should normally be in hiragana; the main exceptions being:
- Chinese or Korean words and names, which are often transliterated using katakana;
- the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)
- older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).
More than one reading can be entered where alternatives are possible. This can occur when
- a kanji has alternative readings;
- where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;
- where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading. As in the Kanji section, place the more common reading(s) first.
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).
A set of tags, e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: , , etc. Each sense can have a number of part of speech tags (POS), e.g. [n], [adj-i] and miscellaneous tags, e.g. [abbr] and [col].
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.
- do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.
- where the Japanese has more than one distinct meaning, break the section into senses.
- make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:
- abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"
- conjunctions: "rice field; rice paddy" not "rice field or paddy"
- where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".)
- do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.
- do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.
- make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.
- include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in paretheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.
- when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.
- provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails
- never create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").
- when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)
- put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".
- when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. For example:
- [fld=comp] floating-point
- short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed.
- where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)
- it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:
- place "[lit]" at the front of the gloss;
- place this gloss last, after the real translation(s).
Note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.
- make sure the meaning agrees with the part-of-speech. If the Japanese word is a noun, don't make the translation a verb (e.g. to xxxx)
- if a term can stand alone (as a noun or participle), list [n] as the first part-of-speech and give the noun form in the translation. Do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".
- if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".
- if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.
- when entering a verb, use the infinitive in English (to run, to jump, etc.)
- for adjectives, the English entry should be just the adjective, not the adjective and copula:
- "lucky" not "be lucky" or "is lucky"
- for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.
- if the word comes from another language, mark this next to the English meaning. The format is [lsrc=lng:], where lng is the three-letter code from the ISO 639-2:1998 "Codes for the representation of names of languages" standard:
- アルバイト [n,vs] part-time job [lsrc=ger:Arbeit]
- アールデコ [n] art deco [lsrc=fre:]
Don't do this for (i) common Sino-Japanese vocbaulary, (ii) English loan-words where the first translation listed is the source word.
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:
- where one entry is an abbreviation of another, e.g. 学割 and 学生割引.
- where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.
- where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms.
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the detailed instructions). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字]
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent Wikipedia article on this.)
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the Japanese WordNet which specifically provide details of large numbers of synonyms. Some systems such as WWWJDIC link to the Japanese WordNet as part of the entry display.
On occasions two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule. For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries.
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other.
Is it worth including?
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement, and often leads to some debate between editors before a proposed entry is accepted or rejected. The following are a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.
- is its meaning not obvious from the component parts. Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)
- is it not what someone reasonable proficient in Japanese would come up with when trying to express the English meaning in Japanese (for example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)
- is it already in one or more dictionaries (other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage vast and rather indiscriminate.)
- does it have a reading which is not obvious from the constituent kanji (some expressions use unusual or irregular readings, often because are based on archaic forms.)
- is it very, very common, with squillions of hits in WWW pages, etc. (This is a rather weak test, and is mainly used with idiomatic expressions.)
Names of biological species
The rules we are using for biological species are:
- Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: common_name (scientific_name), e.g. European magpie (Pica pica). If the common name is unknown, the preferred format is: scientific_name (description), e.g. Mola mola (a species of sunfish).
- Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".
- Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "Tyrannosaurus rex", not "tyrannosaurus rex" or "Tyrannosaurus Rex"
- Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.
- For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: cinnamon bear (Ursus americanus cinnamomum)
- For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: occluded blindweed (Calystegia sepium subsp. erratica)
- For varieties, the abbreviation "var." must be used.
- For forms, "f." must be used.
- Cultivar epithets should capitalized and placed in single quotes. (e.g. Taxus baccata 'Variegata')
- For forms, "f." must be used.
- For varieties, the abbreviation "var." must be used.
- Do not submit the author name. e.g., raspberry (Rubus idaeus), not raspberry (Rubus idaeus L.) (The "L." stands for Linnaeus.)
- Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.
- Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed after the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (Phoca vitulina)/harbour seal/common seal
- where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.
- Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae
- When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (Anas bahamas) is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess.