JMdict/EDICT Editorial Policy and Guidelines
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the JMdictDB on-line database system.
Before proposing a new entry or an amendment, you should:
- familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;
- make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it is a variant, add it to the existing entry. Check such things as:
- common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;
- common okurigana variants, e.g. 生花/生け花;
- modern and old kanji, e.g. 合気道/合氣道
- check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?
- verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the Goo site. The Eijiro dictionary at the ALC site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information must be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors.
- verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.
- decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters to dictionary to include them. (See the section below.)
Dictionary Entry Fields
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. ＭＰ３プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:
- alternative kanji in the word, e.g. 合気道 and 合氣道
- variations in okurigana, e.g., 生け花 and 生花
- part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use.
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.
Some other points to note:
- in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".
- as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.
- for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.
- for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".
- for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.
- for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.
A set of tags, e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.
In this section enter either:
- the reading(s) of the word/phrase in the Kanji section, or
- the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.
Readings associated with kanji should normally be in hiragana; the main exceptions being:
- Chinese or Korean words and names, which are often transliterated using katakana;
- the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)
- older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).
More than one reading can be entered where alternatives are possible. This can occur when
- a kanji has alternative readings;
- where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;
- where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading. As in the Kanji section, place the more common reading(s) first.
外来語 (in katakana) are entered in this section. Do not enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).
A set of tags, e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: , , etc. Each sense can have a number of part of speech tags (POS), e.g. [n], [adj-i] and miscellaneous tags, e.g. [abbr] and [col].
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.
- do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.
- where the Japanese has more than one distinct meaning, break the section into senses.
- make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:
- abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"
- conjunctions: "rice field; rice paddy" not "rice field or paddy"
- where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".)
- do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.
- do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.
- when putting numbers into translations be consistent and concise. In general:
- if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);
- if the numbers are in something more descriptive or narrative, use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels).
- avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.
- make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.
- include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.
- when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.
- provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails
- never create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").
- when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)
- put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".
- when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is here. For example:
- [fld=comp] floating-point
- when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.
- short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed.
- where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)
- it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:
- place "[lit]" at the front of the gloss;
- place this gloss last, after the real translation(s).
- note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.
- on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is not used if it is the only gloss in the sense. (At present the "expl" tag is only used in the database - it is not yet exported to JMdict or EDICT.)
Which Reference Is Best?
On occasions references will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:
- our goal is to reflect modern Japanese, so precedence should be given to sources that indicate up-to-date usage;
- the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;
- 広辞苑 lists its meanings in historical order, so use its material with caution;
- in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;
- if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable number of WWW hits;
- WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant
Part-Of-Speech (POS) Issues
- make sure the meaning agrees with the part-of-speech. If the Japanese word is a noun, don't make the translation a verb (e.g. to xxxx)
- if a term can stand alone (as a noun or participle), list [n] as the first part-of-speech and give the noun form in the translation. Do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".
- if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".
- if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.
- when entering a verb, use the infinitive in English (to run, to jump, etc.)
- for adjectives, the English entry should be just the adjective, not the adjective and copula:
- "lucky" not "be lucky" or "is lucky"
- for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.
- there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should only be used for archaic words which never use a modern conjugation, e.g. 崇まふ.
- if the word comes from another language, mark this next to the English meaning. The format is [lsrc=lng:], where lng is the three-letter code from the ISO 639-2:1998 "Codes for the representation of names of languages" standard:
- アルバイト [n,vs] part-time job [lsrc=ger:Arbeit]
- アールデコ [n] art deco [lsrc=fre:]
Don't do this for (i) common Sino-Japanese vocabulary, (ii) English loan-words where the first translation listed is the source word; (iii) words/terms which are translations from other languages.
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:
- where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).
- where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.
- where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms.
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the detailed instructions). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字]
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent Wikipedia article on this.)
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the Japanese WordNet which specifically provide details of large numbers of synonyms. Some systems such as WWWJDIC link to the Japanese WordNet as part of the entry display.
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:
- add the tag "[abbr]" to indicate it is an abbreviation;
- add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.
This is where you indicate the sources for the entry. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry came from.
It is very important that something be put in this field. If nothing is entered, the editors will have to go searching themselves (which will make them grumpy and less inclined to feel positive about the suggestion.)
The best references are to other dictionaries, and the more the better. Usually just the name of the dictionary will do, where the proposed entry is already an entry in the reference. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:
- koj: Kôjien, 広辞苑 - a major medium-sized 国語辞典.
- daijr: Daijirin, 大辞林 - another major medium-sized 国語辞典.
- daijs: Daijisen, 大辞泉 - another major medium-sized 国語辞典.
- nikk: Nikkoku 日国/日本国語大辞典 - a major multi-volume 国語辞典.
- GG5: Kenkyusha 新和英大辞典第５版 - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)
- ＫＯＤ追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site
- ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary
- 新和英中辞典: medium Kenkyusha JE dictionary
- リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary
- 新英和大辞典: large Kenkyusha English-Japanese dictionary
- 新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary
- eij or alc: Eijiro, 英辞郎 - large word/phrase collection, available online at the ALC site.
- JWN: Japanese WordNet
- LSD: Life Sciences Dictionary - major biomedical terminology dictionary
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please do not give the URL to the site as a reference. Instead state the actual dictionary being used for the reference (大辞林, 日国, etc.)
If the reference includes a Wikipedia article, quote the URL.
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified. Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses cannot be seen by people browsing the database; they are only visible to editors who have logged into the system.
Although the database supporting the dictionary uses Unicode coding, and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:
- the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.
- the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.
The JMdict database is in Unicode, and thus can contain any valid Unicode characters.
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:
- any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.
- for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.
- if a kanji or reading part of an entry contain non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however they will not be propagated into the EDICT/EDICT2 distributions.
Merging Entries/Two-out-of-three Rule
On occasions two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a paper in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.
Is it worth including?
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement, and often leads to some debate between editors before a proposed entry is accepted or rejected. The following are a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.
- is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)
- is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)
- is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.)
- does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)
- is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)
Names of biological species
The rules we are using for biological species are:
- Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: common_name (scientific_name), e.g. European magpie (Pica pica). If the common name is unknown, the preferred format is: scientific_name (description), e.g. Mola mola (a species of sunfish).
- Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".
- Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "Tyrannosaurus rex", not "tyrannosaurus rex" or "Tyrannosaurus Rex"
- Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.
- For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: cinnamon bear (Ursus americanus cinnamomum)
- For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: occluded blindweed (Calystegia sepium subsp. erratica)
- For varieties, the abbreviation "var." must be used.
- For forms, "f." must be used.
- Cultivar epithets should capitalized and placed in single quotes. (e.g. Taxus baccata 'Variegata')
- For forms, "f." must be used.
- For varieties, the abbreviation "var." must be used.
- Do not submit the author name. e.g., raspberry (Rubus idaeus), not raspberry (Rubus idaeus L.) (The "L." stands for Linnaeus.)
- Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.
- Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed after the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (Phoca vitulina)/harbour seal/common seal
- where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.
- Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae
- When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (Anas bahamas) is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess.