https://www.edrdg.org/wiki/api.php?action=feedcontributions&user=JimBreen&feedformat=atomEDRDG Wiki - User contributions [en]2024-03-28T23:25:07ZUser contributionsMediaWiki 1.39.1https://www.edrdg.org/wiki/index.php?title=Sentence-Dictionary_Linking&diff=971Sentence-Dictionary Linking2024-01-16T05:09:18Z<p>JimBreen: /* Index Format */</p>
<hr />
<div><br />
To enable dictionary systems, apps, etc. to use the Japanese-English sentences from the Tanaka Corpus/Tatoeba as examples, a set of word-level indices have been compiled and are associated with each sentence (at present about 150,000 sentences have indices.) These indices are maintained within the Tatoeba system (there is a special GUI for this), and periodically downloaded for use with dictionary systems. The indices are particularly associated with the JMdict/EDICT2 dictionary files, but may also be used elsewhere.<br />
<br />
==Index Format==<br />
<br />
The indices for a sentence consist of a line of text with space-delimited index elements for each word in the sentence. The following is an example:<br />
<br />
Sentence: その家はかなりぼろ屋になっている。<br />
<br />
Indices: 其の[01]{その} 家(いえ)[01] は 可也{かなり} ぼろ屋[01]~ になる[01]{になっている}<br />
<br />
The format of the index elements is as follows:<br />
* the usual headword as it appears in the dictionary. Even if the word is usually written in kana, the kanji form must be used if it is available. This field is mandatory, howver it may be omitted for proper names not found in the dictionary.<br />
* the reading of the word in kana, or the numerical sequence number of the appropriate entry in the JMdict dictionary in the format "#nnnnnnnn". This is optional, however it '''must''' be used if there are several different dictionary entries with the same headword. This field is in regular parentheses.<br />
* a sense number. This is used when the word has multiple senses in the JMdict/EDICT2 file, and indicates which sense applies in the sentence. It is a two-digit numeric in square parentheses. The field is optional.<br />
* the form in which the word appears in the sentence. This may differ from the indexing word, e.g. if it is an inflected verb or adjective, if the word is usually written in a different way, etc. This field is in "curly" parentheses. It is not mandatory, but should be included where appropriate.<br />
* a "~" character to indicate that the sentence pair is a good and checked example of the usage of the word. Words are marked to enable appropriate sentences to be selected by dictionary software. Typically only one instance per sense of a word will be marked. (The WWWJDIC server displays these sentences below the display of the related dictionary entry.) Note that more than one index element for a sentence can have the "~" tag.<br />
<br />
Some indices are followed by a "|" character and a digit. These are an artefact from a former maintenance system, and can be safely ignored.<br />
<br />
The fields after the indexing headword ()[]{}~ '''must''' be in that order.<br />
<br />
==File Format==<br />
<br />
A file of the Japanese-English sentence pairs with the indices can be downloaded from the [https://downloads.tatoeba.org/exports/wwwjdic.csv Tatoeba site]. This file, which is generated once each week, is in UTF-8 encoding, and has the following format:<br />
<br />
:Jpn_seq_no[TAB]Eng_seq_no[TAB]Japanese sentence[TAB]English sentence[TAB]Indices<br />
<br />
Another version, which is used by the WWWJDIC servers, has the sentences and indices on separate lines. The format is:<br />
<br />
:A: Japanese sentence[TAB]English sentence#ID=Engseq_Jpnseq<br />
:B: Indices<br />
<br />
This file can be downloaded in [http://ftp.edrdg.org/pub/Nihongo/examples.gz EUC-JP coding] or [http://ftp.edrdg.org/pub/Nihongo/examples.utf.gz UTF-8 coding.]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_Board&diff=970Editorial Board2024-01-16T04:40:12Z<p>JimBreen: </p>
<hr />
<div>The content of the JMdict database and derived dictionaries is under the control of the JMdict Editorial Board. This group of people was initially set up by invitation in 2010 by Jim Breen. It manages its own membership, co-opting as members people who are able and willing to participate in maintaining and enhancing the quality of the entries in the database, and in formulating policies, etc. for the handling of the [[Editorial_Process| editorial process]]. Members of the Board often discuss issues "off-line" by email.<br />
<br />
Editorial Board members are registered in the database as "editors", i.e. they have the power to approve new entries and changes to existing entries, and to delete entries.<br />
<br />
There is no formally established procedure for becoming a member of the Board, however the usual practice is for the Board to extend an invitation to a contributor who has demonstrated through the quality of their submissions of new entries and amendments that they are capable of carrying out the role. <br />
<br />
The current Board members (not all of whom are active) are:<br />
<br />
*[http://nihongo.monash.edu/index.html Jim Breen]<br />
*[http://www2.unb.ca/~rmalenf1/index.html René Malenfant]<br />
*Marcus Richert<br />
*Robin Scott<br />
*Johan Råde<br />
*Stephen Kraus<br />
*Syed Raza<br />
*Jean-Luc Léger<br />
*Paul Blay<br />
*Paul Upchurch<br />
*Richard Warmington</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=969Main Page2023-11-26T22:12:40Z<p>JimBreen: /* The Tanaka Corpus */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.) Do not use this forum to discuss specific entries as these should be raised in the database itself.<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_b.gz JMdict_b.gz ] - the basic JMdict file with only English glosses. This file omits several thousand proper name entries from JMnedict;;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
While the Tatoeba Project and Tanaka Corpus are largely independent of the JMdict project there are two areas of overlap:<br />
* in the [[Sentence-Dictionary Linking|indexing of sentences]] the JMdict sequence numbers are occasionally used to distinguish between otherwise identical surface forms;<br />
* a special version of JMdict ([http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz]) is available which has the "priority" example sentences from the Tatoeba Project embedded within the relevant entries. This version is automatically generated.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
Several thousand common entries from JMnedict are also included in the JMdict distribution.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software that provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files. The files can be downloaded - use the links in that page.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=968Editorial policy2023-10-27T23:24:00Z<p>JimBreen: /* Other Issues/Policies */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits. If a term is only found in 日国 it should be supported by other references or evidence of non-archaic use;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
***in a number of cases where terms are usually used as adjectives, some Japanese dictionaries will label them as "[名・形動]" or equivalent. In such cases, we typically tag them as "adj-na,n", etc. however if noun use is rare and one or more Japanese dictionaries have only 形動 the "n" tag can be omitted. <br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: Nikkoku 日国/日本国語大辞典 - a major multi-volume 国語辞典. (Online sites usually only have the abridged edition.)<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [https://bond-lab.github.io/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.<br />
<br />
===Issues Forum===<br />
There is an [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.) Do not use this forum to discuss specific entries as these should be raised in the database itself.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=967Main Page2023-10-27T23:18:28Z<p>JimBreen: /* Documentation and Links */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.) Do not use this forum to discuss specific entries as these should be raised in the database itself.<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_b.gz JMdict_b.gz ] - the basic JMdict file with only English glosses. This file omits several thousand proper name entries from JMnedict;;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
Several thousand common entries from JMnedict are also included in the JMdict distribution.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software that provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files. The files can be downloaded - use the links in that page.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=966Editorial policy2023-09-02T01:28:35Z<p>JimBreen: /* Part-Of-Speech (POS) Issues */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits. If a term is only found in 日国 it should be supported by other references or evidence of non-archaic use;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
***in a number of cases where terms are usually used as adjectives, some Japanese dictionaries will label them as "[名・形動]" or equivalent. In such cases, we typically tag them as "adj-na,n", etc. however if noun use is rare and one or more Japanese dictionaries have only 形動 the "n" tag can be omitted. <br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: Nikkoku 日国/日本国語大辞典 - a major multi-volume 国語辞典. (Online sites usually only have the abridged edition.)<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [https://bond-lab.github.io/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Sentence-Dictionary_Linking&diff=965Sentence-Dictionary Linking2023-08-28T04:53:58Z<p>JimBreen: /* File Format */</p>
<hr />
<div><br />
To enable dictionary systems, apps, etc. to use the Japanese-English sentences from the Tanaka Corpus/Tatoeba as examples, a set of word-level indices have been compiled and are associated with each sentence (at present about 150,000 sentences have indices.) These indices are maintained within the Tatoeba system (there is a special GUI for this), and periodically downloaded for use with dictionary systems. The indices are particularly associated with the JMdict/EDICT2 dictionary files, but may also be used elsewhere.<br />
<br />
==Index Format==<br />
<br />
The indices for a sentence consist of a line of text with space-delimited index elements for each word in the sentence. The following is an example:<br />
<br />
Sentence: その家はかなりぼろ屋になっている。<br />
<br />
Indices: 其の[01]{その} 家(いえ)[01] は 可也{かなり} ぼろ屋[01]~ になる[01]{になっている}<br />
<br />
The format of the index elements is as follows:<br />
* the usual headword as it appears in the dictionary. Even if the word is usually written in kana, the kanji form must be used if it is available. This field is mandatory, howver it may be omitted for proper names not found in the dictionary.<br />
* the reading of the word in kana, or the numerical sequence number of the appropriate entry in the JMdict dictionary in the format "#nnnnnnnn". This is optional, however it '''must''' be used if there are several different dictionary entries with the same headword. This field is in regular parentheses.<br />
* a sense number. This is used when the word has multiple senses in the JMdict/EDICT2 file, and indicates which sense applies in the sentence. It is a two-digit numeric in square parentheses. The field is optional.<br />
* the form in which the word appears in the sentence. This may differ from the indexing word, e.g. if it is an inflected verb or adjective, if the word is usually written in a different way, etc. This field is in "curly" parentheses. It is not mandatory, but should be included where appropriate.<br />
* a "~" character to indicate that the sentence pair is a good and checked example of the usage of the word. Words are marked to enable appropriate sentences to be selected by dictionary software. Typically only one instance per sense of a word will be marked. (The WWWJDIC server displays these sentences below the display of the related dictionary entry.) <br />
<br />
Some indices are followed by a "|" character and a digit. These are an artefact from a former maintenance system, and can be safely ignored.<br />
<br />
The fields after the indexing headword ()[]{}~ '''must''' be in that order.<br />
<br />
==File Format==<br />
<br />
A file of the Japanese-English sentence pairs with the indices can be downloaded from the [https://downloads.tatoeba.org/exports/wwwjdic.csv Tatoeba site]. This file, which is generated once each week, is in UTF-8 encoding, and has the following format:<br />
<br />
:Jpn_seq_no[TAB]Eng_seq_no[TAB]Japanese sentence[TAB]English sentence[TAB]Indices<br />
<br />
Another version, which is used by the WWWJDIC servers, has the sentences and indices on separate lines. The format is:<br />
<br />
:A: Japanese sentence[TAB]English sentence#ID=Engseq_Jpnseq<br />
:B: Indices<br />
<br />
This file can be downloaded in [http://ftp.edrdg.org/pub/Nihongo/examples.gz EUC-JP coding] or [http://ftp.edrdg.org/pub/Nihongo/examples.utf.gz UTF-8 coding.]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=964Main Page2023-04-28T06:52:49Z<p>JimBreen: /* The ENAMDICT/JMnedict Project */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.)<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_b.gz JMdict_b.gz ] - the basic JMdict file with only English glosses. This file omits several thousand proper name entries from JMnedict;;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
Several thousand common entries from JMnedict are also included in the JMdict distribution.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software that provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files. The files can be downloaded - use the links in that page.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=963Main Page2023-04-28T06:51:06Z<p>JimBreen: /* Current Version &amp; Downloads */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.)<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_b.gz JMdict_b.gz ] - the basic JMdict file with only English glosses. This file omits several thousand proper name entries from JMnedict;;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software that provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files. The files can be downloaded - use the links in that page.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=962Editorial policy2023-02-21T01:38:36Z<p>JimBreen: /* Which Reference Is Best? */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits. If a term is only found in 日国 it should be supported by other references or evidence of non-archaic use;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: Nikkoku 日国/日本国語大辞典 - a major multi-volume 国語辞典. (Online sites usually only have the abridged edition.)<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [https://bond-lab.github.io/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=961Editorial policy2023-02-10T05:43:06Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: Nikkoku 日国/日本国語大辞典 - a major multi-volume 国語辞典. (Online sites usually only have the abridged edition.)<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [https://bond-lab.github.io/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=960Editorial policy2023-02-10T01:41:29Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [https://bond-lab.github.io/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=959Editorial policy2023-02-10T01:37:38Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=958Editorial policy2023-02-10T01:37:02Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*meikyo (or mk): Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*smk: Shinmeikai Kokugo Jiten 新明解国語辞典<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*chuujiten: 新和英中辞典: medium Kenkyusha JE dictionary<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
^prog: プログレッシブ和英中辞典 Progressive J-E dictionary<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*カタカナ新語辞典: a useful dictionary of loanwords from Gakken.<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=957Editorial policy2023-02-08T08:11:09Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*meikyo: Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*iwakoku: Iwanami Kokugo Jiten, 岩波国語辞典 - small sized 国語辞典<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=956Editorial policy2023-02-07T23:28:13Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*jitsuyo: [http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*meikyo: Meikyo Kokugo Jiten, 明鏡国語辞典 - small-sized 国語辞典<br />
*sankoku: Sanseido Kokugo Jiten, 三省堂国語辞典 - small-sized 国語辞典<br />
*saito: Saito's Japanese-English Dictionary, NEW斎藤和英大辞典 - large Japanese-English dictionary published in 1928<br />
*DDB: Digital Dictionary of Buddhism - online Buddhism dictionary<br />
*DBJG/DIJG/DAJG: A Dictionary of Beginner/Intermediate/Advanced Japanese Grammar - grammar dictionaries by Seiichi Makino and Michio Tsutsui<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=955Editorial policy2023-02-07T22:45:33Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary. (Sometimes abbreviated as "RP".)<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=954KANJIDIC Project2023-01-22T10:26:43Z<p>JimBreen: /* Content & Format */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable sortable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1,110 Kanji). Note that 1,106 of the G8 kanji are in the KANJIDIC file, a further two are in the KANJD212 file and the remaining two are only in the KANJIDIC2 XML file; <br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to be more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V4.0). See the [https://www.edrdg.org/edrdg/licence.html EDRDG General Dictionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=953KANJIDIC Project2023-01-22T10:10:20Z<p>JimBreen: /* Content & Format */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable sortable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1,110 Kanji);<br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to be more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V4.0). See the [https://www.edrdg.org/edrdg/licence.html EDRDG General Dictionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=952Editorial policy2023-01-10T04:52:42Z<p>JimBreen: /* Old and Rarely Used Terms */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "rare". This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old-fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=951Editorial policy2022-12-13T03:31:33Z<p>JimBreen: /* Date and Time Formats */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
For classifying years in dates, use the secular BCE (Before Common Era) and CE (Common Era). In dates after 1,000 CE the "CE" is usually omitted.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict-EDICT_Dictionary_Project&diff=950JMdict-EDICT Dictionary Project2022-11-08T22:43:06Z<p>JimBreen: /* OTHER LANGUAGES */</p>
<hr />
<div>= JMdict/EDICT JAPANESE/ENGLISH DICTIONARY PROJECT =<br />
<br />
== INTRODUCTION ==<br />
<br />
The JMdict/EDICT project has as its goal the production of a comprehensive freely-available Japanese/English Dictionary database in machine-readable form which can be used by a variety of applications and servers.<br />
<br />
The project began in 1991 with the expansion of the EDICT simple Japanese-English dictionary file. (See below under History)<br />
<br />
At present the project has the following dictionary files available:<br />
<br />
* the full Japanese-Multilingual Dictionary (JMdict) file which is distributed in XML format. The JMdict file is aimed at being a multilingual lexical database with Japanese as the pivot language and also includes translations of words and phrases in a number of languages other than English. It has been designed to support the requirements of Japanese lexicography, including multiple surface forms, orthographical variants, okurigana variants, multile readings, etc.<br />
* the EDICT2 file, which is in a relatively simple one-line-per-entry text format based on the original EDICT format, and which contains almost all the information in the JMdict edition;<br />
* the EDICT file, which follows the original format of one kanji form and reading per entry, and contains a reduced amount of information. It is provided to maintain support for software which uses the original EDICT file format;<br />
* the EDICT_SUB file, which contains about 20% of the most common entries in the EDICT file.<br />
<br />
The dictionary data is maintained in an online database under the oversight of an editorial board, and the JMdict and EDICT versions are generated and released daily.<br />
<br />
The dictionary files are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group] who are the owners of the copyright.<br />
<br />
An earlier version of this page can be found [http://www.edrdg.org/jmdict/edict_doc_depr.html here.] Note that it contains many out-of-date links.<br />
<br />
== CURRENT VERSION &amp; DOWNLOAD ==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file. (Only to be used in legacy apps, etc.)<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
== PROJECT FORUM ==<br />
<br />
The are several forums where this project is actively discussed.<br />
<br />
The original forum was the <tt> sci.lang.japan</tt> [http://groups.google.com/group/sci.lang.japan Usenet newsgroup. ] More recently a [https://groups.google.com/g/edict-jmdict mailing list ] specifically for project discussion has begun. (Go to the "About" link on that page to initiate joining the discussion.)<br />
<br />
== Next Generation ==<br />
<br />
A [[JMdict:_Next_Generation|major revision]] of the JMdict structure is planned as a way of dealing with a number of issues which have emerged during the life of the project.<br />
<br />
== DATABASE and UPDATING ==<br />
<br />
The dictionary data is all held in a PostgreSQL database and maintained using the [http://www.edrdg.org/wiki/index.php/JMdictDB_Project JMdictDB online system]. The JMdict version is generated directly from the database. From this the EDICT/EDICT2 versions are generated using utility software. You can explore the database and propose edits and new entries via its [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search Form].<br />
<br />
The [http://www.edrdg.org/wiki/index.php/Main_Page#The_JMdict.2FEDICT_Project EDRDG Wiki] has a wealth of information about the dictionary database, including suggestions about [http://www.edrdg.org/wiki/index.php/JMdict:_Getting_Started getting started, ] the detailed [http://www.edrdg.org/wiki/index.php/Editorial_policy editorial policy and guidelines], etc. etc.<br />
<br />
== FORMAT ==<br />
<br />
The basic format of the entries in the dictionary files can be seen in detail by examining the [http://www.edrdg.org/jmdict/jmdict_dtd_h.html DTD] (Document Type Declaration) of the XML-format JMdict file. The DTD is heavily annotated with content and structural information.[http://www.edrdg.org/jmdict/dtd-jmdict.xml (download)]<br />
<br />
In summary, each dictionary entry is independent, although there may be cross-reference fields pointing to other entries. Each entry consists of<br />
<br />
* kanji elements, i.e. headwords containing at least one kanji character, plus associated tags indicating some status or characteristic of the headword. Where there are multiple headwords, they have been ordered according to frequency of usage, as far as this can be determined;<br />
* reading elements, containing either the reading in kana of the headword, or the headword itself in the case of headwords only in kana. The elements also include tags indicating some status or characteristics. As with the kanji headwords, where there are multiple readings they have been ordered according to frequency of usage, as far as this can be determined;<br />
* general coded information relating to the entry as a whole, such as original language, date-of-creation, etc.<br />
* sense elements, containing the translational equivalents or glosses of the headword(s). As Japanese is not highly polysemous, there is often only one sense. Associated with the sense elements is other coded data indicating the part-of-speech, field of application, miscellaneous information, etc. As with headwords and readings, the glosses are ordered with the most common appearing first.<br />
<br />
The format and coding of the distributed files is as follows:<br />
<br />
* the JMdict file contains the complete dictionary information in XML format as per the DTD. This file is in Unicode/ISO-10646 coding using UTF-8 encapsulation. [http://www.edrdg.org/jmdict/jmdict_sample.html (Sample Entry)]<br />
* the EDICT file is in the original relatively simple format based on the text data file of the SKK input-method. Each entry is in the form:<br />
: KANJI [KANA] /(general information) gloss/gloss/.../<br />
:: or<br />
: KANA /(general information) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT format:<br />
:: 収集 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/<br />
:: (in addition to equivalent entries with the 蒐集, 拾集 and 収輯 kanji compounds.)<br />
: Where there are multiple senses, these are indicated by (1), (2), etc. before the first gloss in each sense. As this format only allows a single kanji headword and reading, entries are generated for each possible headword/reading combination. As the format restricts Japanese characters to the kanji and kana fields, any cross-reference data and other informational fields are omitted.<br />
:The EDICT file is distributed in JIS X 0208 coding in EUC-JP encapsulation. (Please note that this original format is only now provided for legacy systems and apps. New systems <b>must</b> use the EDICT2 edition described below);<br />
* the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:<br />
: KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT2 format:<br />
:: 収集(P);蒐集;拾集;収輯 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/(P)/<br />
: In addition, the EDICT2 has as its last field the sequence number of the entry. This matches the "ent_seq" entity value in the XML edition. The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.<br />
: The EDICT2 file is distributed in JIS X 0208 and JIS X 0212 codings in EUC-JP encapsulation;<br />
* the EDICT_SUB file is in the same format as the EDICT file.<br />
<br />
None of the files have the entries in any particular order.<br />
<br />
== PROJECT HISTORY ==<br />
<br />
The project was begun in 1991 by [http://nihongo.monash.edu/ Jim Breen] when an early DOS-based Japanese word-processor (MOKE - Mark's Own Kanji Editor) was released, containing an initial small version of the EDICT file. This was progressively expanded and edited over the following years. In 1999 the EDICT file, which by this time contained about 60,000 entries, was converted into an expanded format and the first XML-format JMdict file released. From that point both JMdict and the EDICT2/EDICT versions have been generated from the same source data.<br />
<br />
The EDICT2 format was created in 2003, primarily for use with the [http://nihongo.monash.edu/cgi-bin/wwwjdic.cgi?1C WWWJDIC] dictionary server, however it is now also used by other servers and applications.<br />
<br />
The growth in entries in the file is largely due to the efforts of the many people who have contributed entries to it over the years and who have participated in the editorial role. The increase in entry numbers has slowed as the file has achieved coverage of a large proportion of the Japanese lexicon. Much of the editorial work in recent years has concentrated on amendments and expansion to existing entries.<br />
<br />
A more expanded explanation of the early developments in the EDICT file can be found in the [http://www.edrdg.org/jmdict/edict_doc_old.html original documentation].<br />
<br />
== COPYRIGHT ==<br />
<br />
Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.<br />
<br />
The files of the project are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group ] who are the current owners of the copyright. As explained in the licence, the files are available for use for most purposes provided acknowledgement and distribution of the documentation is made.<br />
<br />
== LEXICOGRAPHICAL DETAILS ==<br />
<br />
===Inflections, etc.===<br />
In general no inflections of verbs or adjectives have been included, except in idiomatic expressions. Adverbs formed from adjectives (e.g., -ku or -ni) are generally not included. Verbs are, of course, in the plain or "dictionary" form.<br />Composed forms, such as adverbs taking the "to" particle, keiyoudoushi adjectives, etc. are only included in their root from, however the part-of-speech (POS) marker is used to indicate their status. <br />Nouns which can form a verb with the auxiliary verb "suru" only appear in their noun form, but have a POS marker: "vs", to indicate the existence of a verbal form. In general the gloss only relates to the noun itself, but entries are being progressively expanded to include the verbal glosses as well.<br />
===Part of Speech Marking===<br />
The dictionary includes one or more Part of Speech (POS) markings on almost every entry. Examples include: "adj-i" (adjective - 形容詞), "n" (noun - 名詞), "prt" (particle - 助詞), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos (Full POS list)]<br />
===Field of Application===<br />
A number of entries are marked with a specific field of application, e.g. "chem" (chemistry), "math" (mathematics), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld (Full field list)]<br />
===Miscellaneous Markings===<br />
A number of miscellaneous tags are included in entries to provide additional information is a standardized form, e.g. "col" (colloquialism), "sl" (slang), "uk" (term usually in kana), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_misc (Full list) ]<br />
===Word Priority Marking===<br />
The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:<br />
* news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2".<br />
* ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)<br />
* spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.<br />
* gai1/2: common loanwords, also based on the wordfreq file.<br />
* nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on. Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language.<br />
===Okurigana Variants===<br />
Okurigana variants in headwords are handled by including each variant form as a headword. This is to enable software to match with variant forms.<br />
===Spellings===<br />
As far as possible variants of English translation and spelling are included. Where appropriate different translations are included for national variants (e.g. autumn/fall, tap/faucet, etc.). Common spelling variations such as -our/-or and -ize/-ise are handled either by repeating the gloss in both spellings or appending spelling variants in parentheses. No attempt is made to tag English spellings according to country of usage.<br />
===Loanwords and Regional Words===<br />
For loanwords (gairaigo) which have not been derived from English words, the source language and the word in that language are included. Languages have been coded in the three-letter codes from the ISO 639-2:1998 "Codes for the representation of names of languages" standard, e.g. "(fre: avec)" in the EDICT/EDICT2 files and <lsource xml:lang="fre">avec</lsource> in the JMdict file. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_lang (Full list ] of language tags)In the case of gairaigo which have a meaning which is not apparent from the original (usually English) words, the words in the source language are included as: "lang: original words", e.g.<br />
: コンクール /(n) competition (fre: concours)/contest/ <br />
In some cases the entries are pseudo-loanwords that have been constructed in Japan from foreign (usually English) words or word fragments (e.g. 和製英語 - waseieigo). These are tagged with "wasei" in EDICT/EDICT2 entries, e.g.<br />
: アゲンストウィンド /(n) head wind (wasei: against wind)/adverse wind/ <br />
and in JMdict with the "ls_wasei" attribute e.g. <lsource ls_wasei="y">against wind</lsource>A number of tags are used to indicate that a word or phrase is associated with a particular regional language variant within Japan, e.g. "ksb" (Kansai-ben). [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_dial (Full list) ]<br />
<br />
== OTHER LANGUAGES ==<br />
<br />
The JMdict file has the capacity to record glosses for Japanese headwords in many languages. JMdict is currently distributed in two versions: a basic version in which there are only English glosses, and a full version in which there are glosses included in German (133,000 entries), Russian (80,000), Hungarian (51,000), Spanish (39,000), Dutch (29,000), Swedish (16,000), French (15,000) and Slovenian (9,000). Details of the dictionary files used for the non-English glosses in JMdict can be found in the [http://www.edrdg.org/wwwjdic/wwwjdicinf.html#dicfilf_tag WWWJDIC documentation].<br />
<br />
As part of the daily build of the full JMdict file, the Japanese headwords are matched against the dictionary files for the other languages, and glosses are included where there is a match. The non-English glosses are added as separate sets of senses, and as far as possible are broken into individual senses using tags within those files (typically (1) .... (2) ....., etc.) At present there is no attempt to align senses between the languages as there is no consistency between the dictionaries as to the sense splitting. (There is some [[more information]] on the background to the current sense breakup.)<br />
<br />
== ROMAJI VERSIONS? ==<br />
<br />
None of the files in the JMdict/EDICT project use ローマ字 (romanized Japanese), except for proper names such as "Suzuki", "Fuji", etc. or in cases such as "ikebana" where the the romanized Japanese has been adopted as an English term.See the [[Editorial_policy#Romanized_Japanese|Editorial Policy]] for more information on this.<br />
<br />
== RELATED PROJECTS ==<br />
<br />
A number of other Japanese dictionary projects are closely related to this one. Among them are:<br />
<br />
* the [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] Japanese Proper Names Dictionary project, which currently has nearly 740,000 named entities. The files are available in EDICT or XML formats.<br />
* the [[KANJIDIC_Project| KANJIDIC]] project, which maintains and distributes databases of information about kanji.<br />
* the [http://www.edrdg.org/jmdict/compdic_doc.html COMPDIC] file in EDICT format of computing and telecomms terminology. In 2008 the COMPDIC material was included in the main EDICT/JMdict database with tagging indication the entries relate to ICT. A separate "COMPDIC" file is extracted for distribution.<br />
* the [http://www.edrdg.org/krad/kradinf.html RADKFILE/KRADFILE] file of visual elements in kanji, which can be used for finding kanji in dictionaries.<br />
<br />
== SERVERS & PACKAGES ==<br />
<br />
A large number of [[JMdictEDICT_software|WWW servers and software packages]] use the JMdict/EDICT files. <br />
== ACKNOWLEDGEMENTS ==<br />
<br />
Since 1991 a large number of people have contributed to this project; far too many to list here. All their contributions have been most welcome, indeed without the assistance of speakers and students of Japanese this project would not have achieved as much.<br />
<br />
The EDICT/JMdict has been granted approval to use material from the [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]. This approval is most welcome.<br />
<br />
== PUBLICATIONS ==<br />
<br />
Some publications by Jim Breen about the EDICT/JMdict project. A more complete and up-to-date list can be found in [http://www.edrdg.org/~jwb/papers.html Jim's publications page].<br />
<br />
* paper about JMdict presented at the COLING Multilingual Linguistic Resources Workshop in Geneva in August 2004. [http://www.edrdg.org/~jwb/paperdir/jmdictart.html (html)] [http://www.edrdg.org/~jwb/paperdir/jmdictart.pdf (pdf)]<br />''(This paper should be referenced when citing the dictionary in a publication.)''<br />
* an earlier [http://www.edrdg.org/~jwb/paperdir/ws2002_paper.html JMdict paper] about some of the practical issues, presented at the Papillon Project workshop in Tokyo in July 2002.<br />
* a paper presented to the Papillon Project workshop in 2003 in Sapporo on the [http://www.edrdg.org/~jwb/paperdir/dicexamples.html linking of examples sentences in the Tanaka corpus to EDICT entries in WWWJDIC].<br />
* a 1999 workshop paper about WWWJDIC; [http://www.edrdg.org/~jwb/paperdir/wwwjdic_article2.html (updated 2003 version)] [http://nihongo.monash.edu/wwwjdic_article/wwwjdic_article.html (1999 version)].<br />
* an overview paper about EDICT presented at the JSAA conference in 1995; [http://www.edrdg.org/~jwb/paperdir/hpaper.html (html)]<br />
* An early technical report from 1993; [http://www.edrdg.org/~jwb/paperdir/ejdic_report1.pdf (pdf)]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=949KANJIDIC Project2022-10-21T06:25:14Z<p>JimBreen: /* Copyright and Permissions */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable sortable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1130 Kanji);<br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to be more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V4.0). See the [https://www.edrdg.org/edrdg/licence.html EDRDG General Dictionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=948KANJIDIC Project2022-10-19T21:08:42Z<p>JimBreen: /* Content & Format */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable sortable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1130 Kanji);<br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to be more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V3.0). See the [http://www.edrdg.org/edrdg/licence.html EDRDG General Doctionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=947Editorial policy2022-09-10T23:42:10Z<p>JimBreen: /* Meanings */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=946Editorial policy2022-08-15T02:23:22Z<p>JimBreen: /* Search-only Forms */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
Examples of the types of forms that are in this category include:<br />
* kanji typos (変換ミス);<br />
* uncommon 混ぜ書き forms;<br />
* uncommon itaiji and kyūjitai;<br />
* uncommon variant 外来語 forms;<br />
* uncommon irregular okurigana forms;<br />
* uncommon irregular readings.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is <b>strongly</b> suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. The forms do not participate in the restriction structure and there is nothing to indicate whether they are irregular (e.g. 変換ミス) or just uncommon (e.g. itaiji). Displaying them alongside the non-sk/sK forms would be confusing and unhelpful for users. (This concealment approach is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry but does not show 三蜜 as part of the entry.)<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=945Editorial policy2022-08-14T06:03:50Z<p>JimBreen: /* Search-only Forms */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
From August 2022 a number of surface forms are being included in the dictionary database purely for the purpose are enabling them to be used as search keys. The editors have identified a number of these forms which, although not considered appropriate for inclusion and display in dictionary entries, are used "in the wild" enough for them to be useful for looking up entries. This practice of having search-only forms can be seen in several online dictionaries, for example the Kenkyusha site allows the 手古摺る entry to be found using 手こずる as a search key although that form does not appear in the entry.<br />
<br />
These search-only forms are being added at the ends of the sets of surface forms and are being given the tags/attributes of "sK" for forms containing kanji and "sk" for kana-only forms.<br />
<br />
It is suggested that developers of dictionary apps and sites use these forms for searching purposes, but not show them as part of the full entry. This is currently implemented in the WWWJDIC server, where a search for 三蜜 will retrieve the 3密/三密 entry.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=944Editorial policy2022-08-14T05:30:06Z<p>JimBreen: /* Other Issues/Policies */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contributions when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly, we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are in regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Search-only Forms===<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict:_Next_Generation&diff=943JMdict: Next Generation2022-01-24T06:52:57Z<p>JimBreen: /* Element Changes */</p>
<hr />
<div>=The Next Generation of JMdict=<br />
==Introduction==<br />
<br />
This page has been set up to record proposed changes to the JMdict microstructure, i.e. the way the information in the dictionary is recorded and laid out. The file as distributed is in XML format and the structure is defined in the JMdict DTD (document type definition). The current DTD can be viewed [https://www.edrdg.org/jmdict/jmdict_dtd_h.html here], and a sample of an entry [https://www.edrdg.org/jmdict/jmdict_sample.html here].<br />
<br />
The changes are likely to involve:<br />
* additional or changed XML elements. These are the data items of groups of items that carry the information. For example the kanji forms of a Japanese term are located in "keb" elements within a "k_ele" element.<br />
* additional or changed attributes. These attributes and their values provide information about the element, for example the "gloss" element uses the "xml:lang" attribute to carry the language code.<br />
* additional or changed entity values. These are standardized codes covering such things as part-of-speech, dialects, etc.<br />
Note that the changes to the XML elements involve modifying the database structure and support software. The attribute and entity value changes are more straightforward and do not require database or software changes.<br />
<br />
== Element Changes ==<br />
<br />
=== Creation Date and Version ===<br />
<br />
It is proposed to include the creation date and DTD version as attributes in the JMdict element. For example:<br />
<JMdict created="2022-01-24" version="1.10"><br />
<br />
=== Entry-wide Information Elements ===<br />
<br />
It is proposed to introduce between the reading (<r_ele>) and sense (<sense>) elements an information element for carrying relevant information about the lexical item as a whole. At present such information can only be recorded about senses. Thus the top level of the DTD would change from:<br />
* <!ELEMENT entry (ent_seq, k_ele*, r_ele+, sense+)><br />
to<br />
* <!ELEMENT entry (ent_seq, k_ele*, r_ele+, info*, sense+)><br />
<br />
There could be zero, one or more <info> elements. The contents would be unstructured text, and an attribute (inf_type) would be used to indicate the type of information, e.g. literal translation, derivation, etc. The DTD description would be:<br />
* <!ELEMENT info (#PCDATA)><br />
* <!ATTLIST info inf_type CDATA #IMPLIED><br />
<br />
=== Entry-wide Language Source Elements ===<br />
<br />
It is proposed to combine the current <lsource> element move from within the <sense> element to become entry wide. The <lsource> element would retain its current attributes (xml:lang, ls_type, ls_wasei). As an example of this, the current アンジョ entry would simply see <lsource xml:lang="por">anjo</lsource> move from the first (and only) sense to be entry wide.<br />
<br />
Implicit in this change is that entries, such as パン, which record loanwords from several source languages, will need to be split into an entry for each source language term.<br />
<br />
(An earlier suggestion that the <dial> elements would also become entry wide as an attribute of <lsource> has been withdrawn.)<br />
<br />
=== Entry-wide Inflection Pattern Elements ===<br />
<br />
It is proposed to include an entry-wide <infl> element containing information about conjugation or inflection patterns of the entry. This element would typically only be used for entries which comprise or end with a verb or adjective, and would indicate the appropriate inflections for tense, mood, aspect, etc. It would supplement and partially replace the present system where such information is embedded in the part-of-speech coding as the sense level (v1, v5m, adj-i, etc.) The format of the element has yet to be decided.<br />
<br />
=== Pitch Accent Elements ===<br />
<br />
It is proposed to provide for pitch accent information to be included with each reading of a Japanese term. This will be an additional element associated with each reading, and the proposed change to the DTD would from:<br />
* <!ELEMENT r_ele (reb, re_nokanji?, re_restr*, re_inf*, re_pri*)><br />
to<br />
* <!ELEMENT r_ele (reb, re_pa*, re_nokanji?, re_restr*, re_inf*, re_pri*)><br />
* <!ELEMENT re_pa (#PCDATA)><br />
<br />
There could be zero, one or several <re_pa> elements per reading. The actual format of the content of the <re_pa> has yet to be decided, however there should be the potential to support multiple systems for describing pitch accent information. A possible approach would be to have an attribute value such as:<br />
* <!ATTLIST re_pa pa_type CDATA #IMPLIED><br />
An example for entry 1584660 (明日/あした) might be "<re_pa pa_type="am">3</re_pa>" with the あした reading indicating an accent on the 3rd mora.<br />
<br />
The Wikipedia page on [https://en.wikipedia.org/wiki/Japanese_pitch_accent Japanese pitch accent] contains some useful information.<br />
<br />
===Cross-References===<br />
<br />
At present the <xref> element within <sense> simply states a target surface form and if specified a sense number, e.g. "<xref>スライド・1</xref>". It is proposed to expand this by including the target entry sequence number and sense number as attributes, and also to allow for clearer identification of the preferred surface forms used in apps, etc. Examples:<br />
* <xref type="see" seq="1073760" sno="1">スライド[1]</xref><br />
* <xref type="see" seq="1375820" xr="なるほど">なるほど</xref><br />
* <xref type="see" seq="1585480" sno="2" xk="傀儡" xr="くぐつ">傀儡(くぐつ)[2]</xref><br />
<br />
The attributes would be:<br />
<br>- type. Either "see" or "ant".<br />
<br>- seq. The sequence number of the target entry.<br />
<br>- sno. The sense within the target entry to which the cross-reference refers. If absent it will refer to the whole entry.<br />
<br>- xk. The kanji surface form in the target entry to be associated with the cross-reference. The default will the first form in the kanji field of the target entry, however it can be set during the creation or editing of the entry.<br />
<br>- xr. The reading surface form in the target entry to be associated with the cross-reference. Would only be used if there was no kanji field or if a specific reading is the target.<br />
<br />
The text portion would be retained in a modified form as this makes it easier to generate legacy versions such as EDICT. The "・" (nakaguro) character would no longer be used to separate the parts of the target surface form as this character is used within some entry terms.<br />
<br />
In addition an optional "dict" attribute would be available to indicate a cross-reference to a related dictionary, e.g.<br />
* <xref type="see" dict="jmnedict" seq="5524869">朝日新聞</xref><br />
<br />
The current <ant> entity would be removed and instead "<xref type="ant" ......>" would be used.<br />
<br />
== Part-of-Speech Separation ==<br />
<br />
At present the <pos> element within the <sense> element records both actual parts of speech, e.g. "n", "v5s", "adj-i", etc., as well as supplementary information that is not actually a POS, e.g. "adj-no", and general information which is not usually regarded as a POS at all, e.g. "exp", "int", etc. <br />
<br />
It was proposed that an additional element <pos_sup> be introduced to record the information which is not an actual POS. After some discussion that proposal has been withdrawn. A better approach may be to simply document which elements are actually parts of speech and which are supplementary information.<br />
<br />
== Additional Attribute Values ==<br />
<br />
The main use of XML attributes in JMdict is to identify different types of <gloss> elements using the "g_type" attribute. At present the values used are "lit", "fig" and "expl". It is proposed to add the "descr" value to indicate a gloss which is a description of the Japanese term rather than a translation or an explanation of the meaning.<br />
<br />
== Additional Entity Values ==<br />
<br />
JMdict uses an extensive set of standard entity values for such things as part-of-speech tags, dialect names, fields, etc. It is proposed to add a number of additional values. Some which have been added recently are:<br />
<br />
* Christn - term associated with Christianity, as with the current "Buddh" and "Shinto" values<br />
* net-sl - Internet slang<br />
* dated - dated term<br />
* hist - historical term<br />
* litf - literary or formal term<br />
<br />
Note that these entity values are not really part of the "new generation" as they were part of the original structure.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=942Editorial policy2022-01-24T01:15:25Z<p>JimBreen: /* Proper Names */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities (if significant)<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict-EDICT_Dictionary_Project&diff=941JMdict-EDICT Dictionary Project2022-01-17T05:13:28Z<p>JimBreen: /* PROJECT FORUM */</p>
<hr />
<div>= JMdict/EDICT JAPANESE/ENGLISH DICTIONARY PROJECT =<br />
<br />
== INTRODUCTION ==<br />
<br />
The JMdict/EDICT project has as its goal the production of a comprehensive freely-available Japanese/English Dictionary database in machine-readable form which can be used by a variety of applications and servers.<br />
<br />
The project began in 1991 with the expansion of the EDICT simple Japanese-English dictionary file. (See below under History)<br />
<br />
At present the project has the following dictionary files available:<br />
<br />
* the full Japanese-Multilingual Dictionary (JMdict) file which is distributed in XML format. The JMdict file is aimed at being a multilingual lexical database with Japanese as the pivot language and also includes translations of words and phrases in a number of languages other than English. It has been designed to support the requirements of Japanese lexicography, including multiple surface forms, orthographical variants, okurigana variants, multile readings, etc.<br />
* the EDICT2 file, which is in a relatively simple one-line-per-entry text format based on the original EDICT format, and which contains almost all the information in the JMdict edition;<br />
* the EDICT file, which follows the original format of one kanji form and reading per entry, and contains a reduced amount of information. It is provided to maintain support for software which uses the original EDICT file format;<br />
* the EDICT_SUB file, which contains about 20% of the most common entries in the EDICT file.<br />
<br />
The dictionary data is maintained in an online database under the oversight of an editorial board, and the JMdict and EDICT versions are generated and released daily.<br />
<br />
The dictionary files are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group] who are the owners of the copyright.<br />
<br />
An earlier version of this page can be found [http://www.edrdg.org/jmdict/edict_doc_depr.html here.] Note that it contains many out-of-date links.<br />
<br />
== CURRENT VERSION &amp; DOWNLOAD ==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file. (Only to be used in legacy apps, etc.)<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
== PROJECT FORUM ==<br />
<br />
The are several forums where this project is actively discussed.<br />
<br />
The original forum was the <tt> sci.lang.japan</tt> [http://groups.google.com/group/sci.lang.japan Usenet newsgroup. ] More recently a [https://groups.google.com/g/edict-jmdict mailing list ] specifically for project discussion has begun. (Go to the "About" link on that page to initiate joining the discussion.)<br />
<br />
== Next Generation ==<br />
<br />
A [[JMdict:_Next_Generation|major revision]] of the JMdict structure is planned as a way of dealing with a number of issues which have emerged during the life of the project.<br />
<br />
== DATABASE and UPDATING ==<br />
<br />
The dictionary data is all held in a PostgreSQL database and maintained using the [http://www.edrdg.org/wiki/index.php/JMdictDB_Project JMdictDB online system]. The JMdict version is generated directly from the database. From this the EDICT/EDICT2 versions are generated using utility software. You can explore the database and propose edits and new entries via its [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search Form].<br />
<br />
The [http://www.edrdg.org/wiki/index.php/Main_Page#The_JMdict.2FEDICT_Project EDRDG Wiki] has a wealth of information about the dictionary database, including suggestions about [http://www.edrdg.org/wiki/index.php/JMdict:_Getting_Started getting started, ] the detailed [http://www.edrdg.org/wiki/index.php/Editorial_policy editorial policy and guidelines], etc. etc.<br />
<br />
== FORMAT ==<br />
<br />
The basic format of the entries in the dictionary files can be seen in detail by examining the [http://www.edrdg.org/jmdict/jmdict_dtd_h.html DTD] (Document Type Declaration) of the XML-format JMdict file. The DTD is heavily annotated with content and structural information.[http://www.edrdg.org/jmdict/dtd-jmdict.xml (download)]<br />
<br />
In summary, each dictionary entry is independent, although there may be cross-reference fields pointing to other entries. Each entry consists of<br />
<br />
* kanji elements, i.e. headwords containing at least one kanji character, plus associated tags indicating some status or characteristic of the headword. Where there are multiple headwords, they have been ordered according to frequency of usage, as far as this can be determined;<br />
* reading elements, containing either the reading in kana of the headword, or the headword itself in the case of headwords only in kana. The elements also include tags indicating some status or characteristics. As with the kanji headwords, where there are multiple readings they have been ordered according to frequency of usage, as far as this can be determined;<br />
* general coded information relating to the entry as a whole, such as original language, date-of-creation, etc.<br />
* sense elements, containing the translational equivalents or glosses of the headword(s). As Japanese is not highly polysemous, there is often only one sense. Associated with the sense elements is other coded data indicating the part-of-speech, field of application, miscellaneous information, etc. As with headwords and readings, the glosses are ordered with the most common appearing first.<br />
<br />
The format and coding of the distributed files is as follows:<br />
<br />
* the JMdict file contains the complete dictionary information in XML format as per the DTD. This file is in Unicode/ISO-10646 coding using UTF-8 encapsulation. [http://www.edrdg.org/jmdict/jmdict_sample.html (Sample Entry)]<br />
* the EDICT file is in the original relatively simple format based on the text data file of the SKK input-method. Each entry is in the form:<br />
: KANJI [KANA] /(general information) gloss/gloss/.../<br />
:: or<br />
: KANA /(general information) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT format:<br />
:: 収集 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/<br />
:: (in addition to equivalent entries with the 蒐集, 拾集 and 収輯 kanji compounds.)<br />
: Where there are multiple senses, these are indicated by (1), (2), etc. before the first gloss in each sense. As this format only allows a single kanji headword and reading, entries are generated for each possible headword/reading combination. As the format restricts Japanese characters to the kanji and kana fields, any cross-reference data and other informational fields are omitted.<br />
:The EDICT file is distributed in JIS X 0208 coding in EUC-JP encapsulation. (Please note that this original format is only now provided for legacy systems and apps. New systems <b>must</b> use the EDICT2 edition described below);<br />
* the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:<br />
: KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT2 format:<br />
:: 収集(P);蒐集;拾集;収輯 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/(P)/<br />
: In addition, the EDICT2 has as its last field the sequence number of the entry. This matches the "ent_seq" entity value in the XML edition. The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.<br />
: The EDICT2 file is distributed in JIS X 0208 and JIS X 0212 codings in EUC-JP encapsulation;<br />
* the EDICT_SUB file is in the same format as the EDICT file.<br />
<br />
None of the files have the entries in any particular order.<br />
<br />
== PROJECT HISTORY ==<br />
<br />
The project was begun in 1991 by [http://nihongo.monash.edu/ Jim Breen] when an early DOS-based Japanese word-processor (MOKE - Mark's Own Kanji Editor) was released, containing an initial small version of the EDICT file. This was progressively expanded and edited over the following years. In 1999 the EDICT file, which by this time contained about 60,000 entries, was converted into an expanded format and the first XML-format JMdict file released. From that point both JMdict and the EDICT2/EDICT versions have been generated from the same source data.<br />
<br />
The EDICT2 format was created in 2003, primarily for use with the [http://nihongo.monash.edu/cgi-bin/wwwjdic.cgi?1C WWWJDIC] dictionary server, however it is now also used by other servers and applications.<br />
<br />
The growth in entries in the file is largely due to the efforts of the many people who have contributed entries to it over the years and who have participated in the editorial role. The increase in entry numbers has slowed as the file has achieved coverage of a large proportion of the Japanese lexicon. Much of the editorial work in recent years has concentrated on amendments and expansion to existing entries.<br />
<br />
A more expanded explanation of the early developments in the EDICT file can be found in the [http://www.edrdg.org/jmdict/edict_doc_old.html original documentation].<br />
<br />
== COPYRIGHT ==<br />
<br />
Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.<br />
<br />
The files of the project are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group ] who are the current owners of the copyright. As explained in the licence, the files are available for use for most purposes provided acknowledgement and distribution of the documentation is made.<br />
<br />
== LEXICOGRAPHICAL DETAILS ==<br />
<br />
===Inflections, etc.===<br />
In general no inflections of verbs or adjectives have been included, except in idiomatic expressions. Adverbs formed from adjectives (e.g., -ku or -ni) are generally not included. Verbs are, of course, in the plain or "dictionary" form.<br />Composed forms, such as adverbs taking the "to" particle, keiyoudoushi adjectives, etc. are only included in their root from, however the part-of-speech (POS) marker is used to indicate their status. <br />Nouns which can form a verb with the auxiliary verb "suru" only appear in their noun form, but have a POS marker: "vs", to indicate the existence of a verbal form. In general the gloss only relates to the noun itself, but entries are being progressively expanded to include the verbal glosses as well.<br />
===Part of Speech Marking===<br />
The dictionary includes one or more Part of Speech (POS) markings on almost every entry. Examples include: "adj-i" (adjective - 形容詞), "n" (noun - 名詞), "prt" (particle - 助詞), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos (Full POS list)]<br />
===Field of Application===<br />
A number of entries are marked with a specific field of application, e.g. "chem" (chemistry), "math" (mathematics), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld (Full field list)]<br />
===Miscellaneous Markings===<br />
A number of miscellaneous tags are included in entries to provide additional information is a standardized form, e.g. "col" (colloquialism), "sl" (slang), "uk" (term usually in kana), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_misc (Full list) ]<br />
===Word Priority Marking===<br />
The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:<br />
* news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2".<br />
* ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)<br />
* spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.<br />
* gai1/2: common loanwords, also based on the wordfreq file.<br />
* nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on. Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language.<br />
===Okurigana Variants===<br />
Okurigana variants in headwords are handled by including each variant form as a headword. This is to enable software to match with variant forms.<br />
===Spellings===<br />
As far as possible variants of English translation and spelling are included. Where appropriate different translations are included for national variants (e.g. autumn/fall, tap/faucet, etc.). Common spelling variations such as -our/-or and -ize/-ise are handled either by repeating the gloss in both spellings or appending spelling variants in parentheses. No attempt is made to tag English spellings according to country of usage.<br />
===Loanwords and Regional Words===<br />
For loanwords (gairaigo) which have not been derived from English words, the source language and the word in that language are included. Languages have been coded in the three-letter codes from the ISO 639-2:1998 "Codes for the representation of names of languages" standard, e.g. "(fre: avec)" in the EDICT/EDICT2 files and <lsource xml:lang="fre">avec</lsource> in the JMdict file. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_lang (Full list ] of language tags)In the case of gairaigo which have a meaning which is not apparent from the original (usually English) words, the words in the source language are included as: "lang: original words", e.g.<br />
: コンクール /(n) competition (fre: concours)/contest/ <br />
In some cases the entries are pseudo-loanwords that have been constructed in Japan from foreign (usually English) words or word fragments (e.g. 和製英語 - waseieigo). These are tagged with "wasei" in EDICT/EDICT2 entries, e.g.<br />
: アゲンストウィンド /(n) head wind (wasei: against wind)/adverse wind/ <br />
and in JMdict with the "ls_wasei" attribute e.g. <lsource ls_wasei="y">against wind</lsource>A number of tags are used to indicate that a word or phrase is associated with a particular regional language variant within Japan, e.g. "ksb" (Kansai-ben). [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_dial (Full list) ]<br />
<br />
== OTHER LANGUAGES ==<br />
<br />
The JMdict file has the capacity to record glosses for Japanese headwords in many languages. JMdict is currently distributed in two versions: a basic version in which there are only English glosses, and a full version in which there are glosses included in German (133,000 entries), Russian (80,000), Hungarian (51,000), Spanish (39,000), Italian (38,000), Dutch (29,000), Swedish (16,000), French (15,000) and Slovenian (9,000). Details of the dictionary files used for the non-English glosses in JMdict can be found in the [http://www.edrdg.org/wwwjdic/wwwjdicinf.html#dicfilf_tag WWWJDIC documentation].<br />
<br />
As part of the daily build of the full JMdict file, the Japanese headwords are matched against the dictionary files for the other languages, and glosses are included where there is a match. The non-English glosses are added as separate sets of senses, and as far as possible are broken into individual senses using tags within those files (typically (1) .... (2) ....., etc.) At present there is no attempt to align senses between the languages as there is no consistency between the dictionaries as to the sense splitting. (There is some [[more information]] on the background to the current sense breakup.)<br />
<br />
== ROMAJI VERSIONS? ==<br />
<br />
None of the files in the JMdict/EDICT project use ローマ字 (romanized Japanese), except for proper names such as "Suzuki", "Fuji", etc. or in cases such as "ikebana" where the the romanized Japanese has been adopted as an English term.See the [[Editorial_policy#Romanized_Japanese|Editorial Policy]] for more information on this.<br />
<br />
== RELATED PROJECTS ==<br />
<br />
A number of other Japanese dictionary projects are closely related to this one. Among them are:<br />
<br />
* the [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] Japanese Proper Names Dictionary project, which currently has nearly 740,000 named entities. The files are available in EDICT or XML formats.<br />
* the [[KANJIDIC_Project| KANJIDIC]] project, which maintains and distributes databases of information about kanji.<br />
* the [http://www.edrdg.org/jmdict/compdic_doc.html COMPDIC] file in EDICT format of computing and telecomms terminology. In 2008 the COMPDIC material was included in the main EDICT/JMdict database with tagging indication the entries relate to ICT. A separate "COMPDIC" file is extracted for distribution.<br />
* the [http://www.edrdg.org/krad/kradinf.html RADKFILE/KRADFILE] file of visual elements in kanji, which can be used for finding kanji in dictionaries.<br />
<br />
== SERVERS & PACKAGES ==<br />
<br />
A large number of [[JMdictEDICT_software|WWW servers and software packages]] use the JMdict/EDICT files. <br />
== ACKNOWLEDGEMENTS ==<br />
<br />
Since 1991 a large number of people have contributed to this project; far too many to list here. All their contributions have been most welcome, indeed without the assistance of speakers and students of Japanese this project would not have achieved as much.<br />
<br />
The EDICT/JMdict has been granted approval to use material from the [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]. This approval is most welcome.<br />
<br />
== PUBLICATIONS ==<br />
<br />
Some publications by Jim Breen about the EDICT/JMdict project. A more complete and up-to-date list can be found in [http://www.edrdg.org/~jwb/papers.html Jim's publications page].<br />
<br />
* paper about JMdict presented at the COLING Multilingual Linguistic Resources Workshop in Geneva in August 2004. [http://www.edrdg.org/~jwb/paperdir/jmdictart.html (html)] [http://www.edrdg.org/~jwb/paperdir/jmdictart.pdf (pdf)]<br />''(This paper should be referenced when citing the dictionary in a publication.)''<br />
* an earlier [http://www.edrdg.org/~jwb/paperdir/ws2002_paper.html JMdict paper] about some of the practical issues, presented at the Papillon Project workshop in Tokyo in July 2002.<br />
* a paper presented to the Papillon Project workshop in 2003 in Sapporo on the [http://www.edrdg.org/~jwb/paperdir/dicexamples.html linking of examples sentences in the Tanaka corpus to EDICT entries in WWWJDIC].<br />
* a 1999 workshop paper about WWWJDIC; [http://www.edrdg.org/~jwb/paperdir/wwwjdic_article2.html (updated 2003 version)] [http://nihongo.monash.edu/wwwjdic_article/wwwjdic_article.html (1999 version)].<br />
* an overview paper about EDICT presented at the JSAA conference in 1995; [http://www.edrdg.org/~jwb/paperdir/hpaper.html (html)]<br />
* An early technical report from 1993; [http://www.edrdg.org/~jwb/paperdir/ejdic_report1.pdf (pdf)]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict-EDICT_Dictionary_Project&diff=940JMdict-EDICT Dictionary Project2022-01-17T05:04:31Z<p>JimBreen: /* INTRODUCTION */</p>
<hr />
<div>= JMdict/EDICT JAPANESE/ENGLISH DICTIONARY PROJECT =<br />
<br />
== INTRODUCTION ==<br />
<br />
The JMdict/EDICT project has as its goal the production of a comprehensive freely-available Japanese/English Dictionary database in machine-readable form which can be used by a variety of applications and servers.<br />
<br />
The project began in 1991 with the expansion of the EDICT simple Japanese-English dictionary file. (See below under History)<br />
<br />
At present the project has the following dictionary files available:<br />
<br />
* the full Japanese-Multilingual Dictionary (JMdict) file which is distributed in XML format. The JMdict file is aimed at being a multilingual lexical database with Japanese as the pivot language and also includes translations of words and phrases in a number of languages other than English. It has been designed to support the requirements of Japanese lexicography, including multiple surface forms, orthographical variants, okurigana variants, multile readings, etc.<br />
* the EDICT2 file, which is in a relatively simple one-line-per-entry text format based on the original EDICT format, and which contains almost all the information in the JMdict edition;<br />
* the EDICT file, which follows the original format of one kanji form and reading per entry, and contains a reduced amount of information. It is provided to maintain support for software which uses the original EDICT file format;<br />
* the EDICT_SUB file, which contains about 20% of the most common entries in the EDICT file.<br />
<br />
The dictionary data is maintained in an online database under the oversight of an editorial board, and the JMdict and EDICT versions are generated and released daily.<br />
<br />
The dictionary files are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group] who are the owners of the copyright.<br />
<br />
An earlier version of this page can be found [http://www.edrdg.org/jmdict/edict_doc_depr.html here.] Note that it contains many out-of-date links.<br />
<br />
== CURRENT VERSION &amp; DOWNLOAD ==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file. (Only to be used in legacy apps, etc.)<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
== PROJECT FORUM ==<br />
<br />
The are several forums where this project is actively discussed.<br />
<br />
The original forum was the <tt> sci.lang.japan</tt> [http://groups.google.com/group/sci.lang.japan Usenet newsgroup. ] More recently a [http://groups.yahoo.com/group/edict-jmdict/ mailing list ] specifically for project discussion has begun. (Mail to <tt> edict-jmdict-subscribe@yahoogroups.com</tt> to initiate subscription.)<br />
<br />
== Next Generation ==<br />
<br />
A [[JMdict:_Next_Generation|major revision]] of the JMdict structure is planned as a way of dealing with a number of issues which have emerged during the life of the project.<br />
<br />
== DATABASE and UPDATING ==<br />
<br />
The dictionary data is all held in a PostgreSQL database and maintained using the [http://www.edrdg.org/wiki/index.php/JMdictDB_Project JMdictDB online system]. The JMdict version is generated directly from the database. From this the EDICT/EDICT2 versions are generated using utility software. You can explore the database and propose edits and new entries via its [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search Form].<br />
<br />
The [http://www.edrdg.org/wiki/index.php/Main_Page#The_JMdict.2FEDICT_Project EDRDG Wiki] has a wealth of information about the dictionary database, including suggestions about [http://www.edrdg.org/wiki/index.php/JMdict:_Getting_Started getting started, ] the detailed [http://www.edrdg.org/wiki/index.php/Editorial_policy editorial policy and guidelines], etc. etc.<br />
<br />
== FORMAT ==<br />
<br />
The basic format of the entries in the dictionary files can be seen in detail by examining the [http://www.edrdg.org/jmdict/jmdict_dtd_h.html DTD] (Document Type Declaration) of the XML-format JMdict file. The DTD is heavily annotated with content and structural information.[http://www.edrdg.org/jmdict/dtd-jmdict.xml (download)]<br />
<br />
In summary, each dictionary entry is independent, although there may be cross-reference fields pointing to other entries. Each entry consists of<br />
<br />
* kanji elements, i.e. headwords containing at least one kanji character, plus associated tags indicating some status or characteristic of the headword. Where there are multiple headwords, they have been ordered according to frequency of usage, as far as this can be determined;<br />
* reading elements, containing either the reading in kana of the headword, or the headword itself in the case of headwords only in kana. The elements also include tags indicating some status or characteristics. As with the kanji headwords, where there are multiple readings they have been ordered according to frequency of usage, as far as this can be determined;<br />
* general coded information relating to the entry as a whole, such as original language, date-of-creation, etc.<br />
* sense elements, containing the translational equivalents or glosses of the headword(s). As Japanese is not highly polysemous, there is often only one sense. Associated with the sense elements is other coded data indicating the part-of-speech, field of application, miscellaneous information, etc. As with headwords and readings, the glosses are ordered with the most common appearing first.<br />
<br />
The format and coding of the distributed files is as follows:<br />
<br />
* the JMdict file contains the complete dictionary information in XML format as per the DTD. This file is in Unicode/ISO-10646 coding using UTF-8 encapsulation. [http://www.edrdg.org/jmdict/jmdict_sample.html (Sample Entry)]<br />
* the EDICT file is in the original relatively simple format based on the text data file of the SKK input-method. Each entry is in the form:<br />
: KANJI [KANA] /(general information) gloss/gloss/.../<br />
:: or<br />
: KANA /(general information) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT format:<br />
:: 収集 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/<br />
:: (in addition to equivalent entries with the 蒐集, 拾集 and 収輯 kanji compounds.)<br />
: Where there are multiple senses, these are indicated by (1), (2), etc. before the first gloss in each sense. As this format only allows a single kanji headword and reading, entries are generated for each possible headword/reading combination. As the format restricts Japanese characters to the kanji and kana fields, any cross-reference data and other informational fields are omitted.<br />
:The EDICT file is distributed in JIS X 0208 coding in EUC-JP encapsulation. (Please note that this original format is only now provided for legacy systems and apps. New systems <b>must</b> use the EDICT2 edition described below);<br />
* the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:<br />
: KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT2 format:<br />
:: 収集(P);蒐集;拾集;収輯 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/(P)/<br />
: In addition, the EDICT2 has as its last field the sequence number of the entry. This matches the "ent_seq" entity value in the XML edition. The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.<br />
: The EDICT2 file is distributed in JIS X 0208 and JIS X 0212 codings in EUC-JP encapsulation;<br />
* the EDICT_SUB file is in the same format as the EDICT file.<br />
<br />
None of the files have the entries in any particular order.<br />
<br />
== PROJECT HISTORY ==<br />
<br />
The project was begun in 1991 by [http://nihongo.monash.edu/ Jim Breen] when an early DOS-based Japanese word-processor (MOKE - Mark's Own Kanji Editor) was released, containing an initial small version of the EDICT file. This was progressively expanded and edited over the following years. In 1999 the EDICT file, which by this time contained about 60,000 entries, was converted into an expanded format and the first XML-format JMdict file released. From that point both JMdict and the EDICT2/EDICT versions have been generated from the same source data.<br />
<br />
The EDICT2 format was created in 2003, primarily for use with the [http://nihongo.monash.edu/cgi-bin/wwwjdic.cgi?1C WWWJDIC] dictionary server, however it is now also used by other servers and applications.<br />
<br />
The growth in entries in the file is largely due to the efforts of the many people who have contributed entries to it over the years and who have participated in the editorial role. The increase in entry numbers has slowed as the file has achieved coverage of a large proportion of the Japanese lexicon. Much of the editorial work in recent years has concentrated on amendments and expansion to existing entries.<br />
<br />
A more expanded explanation of the early developments in the EDICT file can be found in the [http://www.edrdg.org/jmdict/edict_doc_old.html original documentation].<br />
<br />
== COPYRIGHT ==<br />
<br />
Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.<br />
<br />
The files of the project are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group ] who are the current owners of the copyright. As explained in the licence, the files are available for use for most purposes provided acknowledgement and distribution of the documentation is made.<br />
<br />
== LEXICOGRAPHICAL DETAILS ==<br />
<br />
===Inflections, etc.===<br />
In general no inflections of verbs or adjectives have been included, except in idiomatic expressions. Adverbs formed from adjectives (e.g., -ku or -ni) are generally not included. Verbs are, of course, in the plain or "dictionary" form.<br />Composed forms, such as adverbs taking the "to" particle, keiyoudoushi adjectives, etc. are only included in their root from, however the part-of-speech (POS) marker is used to indicate their status. <br />Nouns which can form a verb with the auxiliary verb "suru" only appear in their noun form, but have a POS marker: "vs", to indicate the existence of a verbal form. In general the gloss only relates to the noun itself, but entries are being progressively expanded to include the verbal glosses as well.<br />
===Part of Speech Marking===<br />
The dictionary includes one or more Part of Speech (POS) markings on almost every entry. Examples include: "adj-i" (adjective - 形容詞), "n" (noun - 名詞), "prt" (particle - 助詞), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos (Full POS list)]<br />
===Field of Application===<br />
A number of entries are marked with a specific field of application, e.g. "chem" (chemistry), "math" (mathematics), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld (Full field list)]<br />
===Miscellaneous Markings===<br />
A number of miscellaneous tags are included in entries to provide additional information is a standardized form, e.g. "col" (colloquialism), "sl" (slang), "uk" (term usually in kana), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_misc (Full list) ]<br />
===Word Priority Marking===<br />
The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:<br />
* news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2".<br />
* ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)<br />
* spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.<br />
* gai1/2: common loanwords, also based on the wordfreq file.<br />
* nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on. Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language.<br />
===Okurigana Variants===<br />
Okurigana variants in headwords are handled by including each variant form as a headword. This is to enable software to match with variant forms.<br />
===Spellings===<br />
As far as possible variants of English translation and spelling are included. Where appropriate different translations are included for national variants (e.g. autumn/fall, tap/faucet, etc.). Common spelling variations such as -our/-or and -ize/-ise are handled either by repeating the gloss in both spellings or appending spelling variants in parentheses. No attempt is made to tag English spellings according to country of usage.<br />
===Loanwords and Regional Words===<br />
For loanwords (gairaigo) which have not been derived from English words, the source language and the word in that language are included. Languages have been coded in the three-letter codes from the ISO 639-2:1998 "Codes for the representation of names of languages" standard, e.g. "(fre: avec)" in the EDICT/EDICT2 files and <lsource xml:lang="fre">avec</lsource> in the JMdict file. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_lang (Full list ] of language tags)In the case of gairaigo which have a meaning which is not apparent from the original (usually English) words, the words in the source language are included as: "lang: original words", e.g.<br />
: コンクール /(n) competition (fre: concours)/contest/ <br />
In some cases the entries are pseudo-loanwords that have been constructed in Japan from foreign (usually English) words or word fragments (e.g. 和製英語 - waseieigo). These are tagged with "wasei" in EDICT/EDICT2 entries, e.g.<br />
: アゲンストウィンド /(n) head wind (wasei: against wind)/adverse wind/ <br />
and in JMdict with the "ls_wasei" attribute e.g. <lsource ls_wasei="y">against wind</lsource>A number of tags are used to indicate that a word or phrase is associated with a particular regional language variant within Japan, e.g. "ksb" (Kansai-ben). [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_dial (Full list) ]<br />
<br />
== OTHER LANGUAGES ==<br />
<br />
The JMdict file has the capacity to record glosses for Japanese headwords in many languages. JMdict is currently distributed in two versions: a basic version in which there are only English glosses, and a full version in which there are glosses included in German (133,000 entries), Russian (80,000), Hungarian (51,000), Spanish (39,000), Italian (38,000), Dutch (29,000), Swedish (16,000), French (15,000) and Slovenian (9,000). Details of the dictionary files used for the non-English glosses in JMdict can be found in the [http://www.edrdg.org/wwwjdic/wwwjdicinf.html#dicfilf_tag WWWJDIC documentation].<br />
<br />
As part of the daily build of the full JMdict file, the Japanese headwords are matched against the dictionary files for the other languages, and glosses are included where there is a match. The non-English glosses are added as separate sets of senses, and as far as possible are broken into individual senses using tags within those files (typically (1) .... (2) ....., etc.) At present there is no attempt to align senses between the languages as there is no consistency between the dictionaries as to the sense splitting. (There is some [[more information]] on the background to the current sense breakup.)<br />
<br />
== ROMAJI VERSIONS? ==<br />
<br />
None of the files in the JMdict/EDICT project use ローマ字 (romanized Japanese), except for proper names such as "Suzuki", "Fuji", etc. or in cases such as "ikebana" where the the romanized Japanese has been adopted as an English term.See the [[Editorial_policy#Romanized_Japanese|Editorial Policy]] for more information on this.<br />
<br />
== RELATED PROJECTS ==<br />
<br />
A number of other Japanese dictionary projects are closely related to this one. Among them are:<br />
<br />
* the [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] Japanese Proper Names Dictionary project, which currently has nearly 740,000 named entities. The files are available in EDICT or XML formats.<br />
* the [[KANJIDIC_Project| KANJIDIC]] project, which maintains and distributes databases of information about kanji.<br />
* the [http://www.edrdg.org/jmdict/compdic_doc.html COMPDIC] file in EDICT format of computing and telecomms terminology. In 2008 the COMPDIC material was included in the main EDICT/JMdict database with tagging indication the entries relate to ICT. A separate "COMPDIC" file is extracted for distribution.<br />
* the [http://www.edrdg.org/krad/kradinf.html RADKFILE/KRADFILE] file of visual elements in kanji, which can be used for finding kanji in dictionaries.<br />
<br />
== SERVERS & PACKAGES ==<br />
<br />
A large number of [[JMdictEDICT_software|WWW servers and software packages]] use the JMdict/EDICT files. <br />
== ACKNOWLEDGEMENTS ==<br />
<br />
Since 1991 a large number of people have contributed to this project; far too many to list here. All their contributions have been most welcome, indeed without the assistance of speakers and students of Japanese this project would not have achieved as much.<br />
<br />
The EDICT/JMdict has been granted approval to use material from the [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]. This approval is most welcome.<br />
<br />
== PUBLICATIONS ==<br />
<br />
Some publications by Jim Breen about the EDICT/JMdict project. A more complete and up-to-date list can be found in [http://www.edrdg.org/~jwb/papers.html Jim's publications page].<br />
<br />
* paper about JMdict presented at the COLING Multilingual Linguistic Resources Workshop in Geneva in August 2004. [http://www.edrdg.org/~jwb/paperdir/jmdictart.html (html)] [http://www.edrdg.org/~jwb/paperdir/jmdictart.pdf (pdf)]<br />''(This paper should be referenced when citing the dictionary in a publication.)''<br />
* an earlier [http://www.edrdg.org/~jwb/paperdir/ws2002_paper.html JMdict paper] about some of the practical issues, presented at the Papillon Project workshop in Tokyo in July 2002.<br />
* a paper presented to the Papillon Project workshop in 2003 in Sapporo on the [http://www.edrdg.org/~jwb/paperdir/dicexamples.html linking of examples sentences in the Tanaka corpus to EDICT entries in WWWJDIC].<br />
* a 1999 workshop paper about WWWJDIC; [http://www.edrdg.org/~jwb/paperdir/wwwjdic_article2.html (updated 2003 version)] [http://nihongo.monash.edu/wwwjdic_article/wwwjdic_article.html (1999 version)].<br />
* an overview paper about EDICT presented at the JSAA conference in 1995; [http://www.edrdg.org/~jwb/paperdir/hpaper.html (html)]<br />
* An early technical report from 1993; [http://www.edrdg.org/~jwb/paperdir/ejdic_report1.pdf (pdf)]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=939Editorial policy2021-11-17T02:12:37Z<p>JimBreen: /* Old and Rarely Used Terms */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
The "hist" (historical) term does not fall into this category. It refers to a past event (e.g. battle, ceremony) or concept (e.g. an art-form common in the 18th century), but the term itself is still in current use.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict-EDICT_Dictionary_Project&diff=938JMdict-EDICT Dictionary Project2021-10-25T22:46:29Z<p>JimBreen: /* FORMAT */</p>
<hr />
<div>= JMdict/EDICT JAPANESE/ENGLISH DICTIONARY PROJECT =<br />
<br />
== INTRODUCTION ==<br />
<br />
The JMdict/EDICT project has as its goal the production of a comprehensive freely-available Japanese/English Dictionary database in machine-readable form which can be used by a variety of applications and servers.<br />
<br />
The project began in 1991 with the expansion of the EDICT simple Japanese-English dictionary file. (See below under History)<br />
<br />
At present the project has the following dictionary files available:<br />
<br />
* the full Japanese-Multilingual Dictionary (JMdict) file which is distributed in XML format. The JMdict file is aimed at being a multilingual lexical database with Japanese as the pivot language and also includes translations of words and phrases in a number of languages other than English. It has been designed to support the requirements of Japanese lexicography, including multiple surface forms, orthographical variants, okurigana variants, multile readings, etc.<br />
* the EDICT2 file, which is in a relatively simple one-line-per-entry text format based on the original EDICT format, and which contains almost all the information in the JMdict edition;<br />
* the EDICT file, which follows the original format of one kanji form and reading per entry, and contains a reduced amount of information. It is provided to maintain support for software which uses the original EDICT file format;<br />
* the EDICT_SUB file, which contains about 20% of the most common entries in the EDICT file.<br />
<br />
The dictionary data is maintained in an online database under the oversight of an editorial board, and the JMdict and EDICT versions are generated and released daily.<br />
<br />
The dictionary files are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group] who are the owners of the copyright.<br />
<br />
== CURRENT VERSION &amp; DOWNLOAD ==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file. (Only to be used in legacy apps, etc.)<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
== PROJECT FORUM ==<br />
<br />
The are several forums where this project is actively discussed.<br />
<br />
The original forum was the <tt> sci.lang.japan</tt> [http://groups.google.com/group/sci.lang.japan Usenet newsgroup. ] More recently a [http://groups.yahoo.com/group/edict-jmdict/ mailing list ] specifically for project discussion has begun. (Mail to <tt> edict-jmdict-subscribe@yahoogroups.com</tt> to initiate subscription.)<br />
<br />
== Next Generation ==<br />
<br />
A [[JMdict:_Next_Generation|major revision]] of the JMdict structure is planned as a way of dealing with a number of issues which have emerged during the life of the project.<br />
<br />
== DATABASE and UPDATING ==<br />
<br />
The dictionary data is all held in a PostgreSQL database and maintained using the [http://www.edrdg.org/wiki/index.php/JMdictDB_Project JMdictDB online system]. The JMdict version is generated directly from the database. From this the EDICT/EDICT2 versions are generated using utility software. You can explore the database and propose edits and new entries via its [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search Form].<br />
<br />
The [http://www.edrdg.org/wiki/index.php/Main_Page#The_JMdict.2FEDICT_Project EDRDG Wiki] has a wealth of information about the dictionary database, including suggestions about [http://www.edrdg.org/wiki/index.php/JMdict:_Getting_Started getting started, ] the detailed [http://www.edrdg.org/wiki/index.php/Editorial_policy editorial policy and guidelines], etc. etc.<br />
<br />
== FORMAT ==<br />
<br />
The basic format of the entries in the dictionary files can be seen in detail by examining the [http://www.edrdg.org/jmdict/jmdict_dtd_h.html DTD] (Document Type Declaration) of the XML-format JMdict file. The DTD is heavily annotated with content and structural information.[http://www.edrdg.org/jmdict/dtd-jmdict.xml (download)]<br />
<br />
In summary, each dictionary entry is independent, although there may be cross-reference fields pointing to other entries. Each entry consists of<br />
<br />
* kanji elements, i.e. headwords containing at least one kanji character, plus associated tags indicating some status or characteristic of the headword. Where there are multiple headwords, they have been ordered according to frequency of usage, as far as this can be determined;<br />
* reading elements, containing either the reading in kana of the headword, or the headword itself in the case of headwords only in kana. The elements also include tags indicating some status or characteristics. As with the kanji headwords, where there are multiple readings they have been ordered according to frequency of usage, as far as this can be determined;<br />
* general coded information relating to the entry as a whole, such as original language, date-of-creation, etc.<br />
* sense elements, containing the translational equivalents or glosses of the headword(s). As Japanese is not highly polysemous, there is often only one sense. Associated with the sense elements is other coded data indicating the part-of-speech, field of application, miscellaneous information, etc. As with headwords and readings, the glosses are ordered with the most common appearing first.<br />
<br />
The format and coding of the distributed files is as follows:<br />
<br />
* the JMdict file contains the complete dictionary information in XML format as per the DTD. This file is in Unicode/ISO-10646 coding using UTF-8 encapsulation. [http://www.edrdg.org/jmdict/jmdict_sample.html (Sample Entry)]<br />
* the EDICT file is in the original relatively simple format based on the text data file of the SKK input-method. Each entry is in the form:<br />
: KANJI [KANA] /(general information) gloss/gloss/.../<br />
:: or<br />
: KANA /(general information) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT format:<br />
:: 収集 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/<br />
:: (in addition to equivalent entries with the 蒐集, 拾集 and 収輯 kanji compounds.)<br />
: Where there are multiple senses, these are indicated by (1), (2), etc. before the first gloss in each sense. As this format only allows a single kanji headword and reading, entries are generated for each possible headword/reading combination. As the format restricts Japanese characters to the kanji and kana fields, any cross-reference data and other informational fields are omitted.<br />
:The EDICT file is distributed in JIS X 0208 coding in EUC-JP encapsulation. (Please note that this original format is only now provided for legacy systems and apps. New systems <b>must</b> use the EDICT2 edition described below);<br />
* the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:<br />
: KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT2 format:<br />
:: 収集(P);蒐集;拾集;収輯 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/(P)/<br />
: In addition, the EDICT2 has as its last field the sequence number of the entry. This matches the "ent_seq" entity value in the XML edition. The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.<br />
: The EDICT2 file is distributed in JIS X 0208 and JIS X 0212 codings in EUC-JP encapsulation;<br />
* the EDICT_SUB file is in the same format as the EDICT file.<br />
<br />
None of the files have the entries in any particular order.<br />
<br />
== PROJECT HISTORY ==<br />
<br />
The project was begun in 1991 by [http://nihongo.monash.edu/ Jim Breen] when an early DOS-based Japanese word-processor (MOKE - Mark's Own Kanji Editor) was released, containing an initial small version of the EDICT file. This was progressively expanded and edited over the following years. In 1999 the EDICT file, which by this time contained about 60,000 entries, was converted into an expanded format and the first XML-format JMdict file released. From that point both JMdict and the EDICT2/EDICT versions have been generated from the same source data.<br />
<br />
The EDICT2 format was created in 2003, primarily for use with the [http://nihongo.monash.edu/cgi-bin/wwwjdic.cgi?1C WWWJDIC] dictionary server, however it is now also used by other servers and applications.<br />
<br />
The growth in entries in the file is largely due to the efforts of the many people who have contributed entries to it over the years and who have participated in the editorial role. The increase in entry numbers has slowed as the file has achieved coverage of a large proportion of the Japanese lexicon. Much of the editorial work in recent years has concentrated on amendments and expansion to existing entries.<br />
<br />
A more expanded explanation of the early developments in the EDICT file can be found in the [http://www.edrdg.org/jmdict/edict_doc_old.html original documentation].<br />
<br />
== COPYRIGHT ==<br />
<br />
Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.<br />
<br />
The files of the project are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group ] who are the current owners of the copyright. As explained in the licence, the files are available for use for most purposes provided acknowledgement and distribution of the documentation is made.<br />
<br />
== LEXICOGRAPHICAL DETAILS ==<br />
<br />
===Inflections, etc.===<br />
In general no inflections of verbs or adjectives have been included, except in idiomatic expressions. Adverbs formed from adjectives (e.g., -ku or -ni) are generally not included. Verbs are, of course, in the plain or "dictionary" form.<br />Composed forms, such as adverbs taking the "to" particle, keiyoudoushi adjectives, etc. are only included in their root from, however the part-of-speech (POS) marker is used to indicate their status. <br />Nouns which can form a verb with the auxiliary verb "suru" only appear in their noun form, but have a POS marker: "vs", to indicate the existence of a verbal form. In general the gloss only relates to the noun itself, but entries are being progressively expanded to include the verbal glosses as well.<br />
===Part of Speech Marking===<br />
The dictionary includes one or more Part of Speech (POS) markings on almost every entry. Examples include: "adj-i" (adjective - 形容詞), "n" (noun - 名詞), "prt" (particle - 助詞), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos (Full POS list)]<br />
===Field of Application===<br />
A number of entries are marked with a specific field of application, e.g. "chem" (chemistry), "math" (mathematics), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld (Full field list)]<br />
===Miscellaneous Markings===<br />
A number of miscellaneous tags are included in entries to provide additional information is a standardized form, e.g. "col" (colloquialism), "sl" (slang), "uk" (term usually in kana), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_misc (Full list) ]<br />
===Word Priority Marking===<br />
The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:<br />
* news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2".<br />
* ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)<br />
* spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.<br />
* gai1/2: common loanwords, also based on the wordfreq file.<br />
* nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on. Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language.<br />
===Okurigana Variants===<br />
Okurigana variants in headwords are handled by including each variant form as a headword. This is to enable software to match with variant forms.<br />
===Spellings===<br />
As far as possible variants of English translation and spelling are included. Where appropriate different translations are included for national variants (e.g. autumn/fall, tap/faucet, etc.). Common spelling variations such as -our/-or and -ize/-ise are handled either by repeating the gloss in both spellings or appending spelling variants in parentheses. No attempt is made to tag English spellings according to country of usage.<br />
===Loanwords and Regional Words===<br />
For loanwords (gairaigo) which have not been derived from English words, the source language and the word in that language are included. Languages have been coded in the three-letter codes from the ISO 639-2:1998 "Codes for the representation of names of languages" standard, e.g. "(fre: avec)" in the EDICT/EDICT2 files and <lsource xml:lang="fre">avec</lsource> in the JMdict file. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_lang (Full list ] of language tags)In the case of gairaigo which have a meaning which is not apparent from the original (usually English) words, the words in the source language are included as: "lang: original words", e.g.<br />
: コンクール /(n) competition (fre: concours)/contest/ <br />
In some cases the entries are pseudo-loanwords that have been constructed in Japan from foreign (usually English) words or word fragments (e.g. 和製英語 - waseieigo). These are tagged with "wasei" in EDICT/EDICT2 entries, e.g.<br />
: アゲンストウィンド /(n) head wind (wasei: against wind)/adverse wind/ <br />
and in JMdict with the "ls_wasei" attribute e.g. <lsource ls_wasei="y">against wind</lsource>A number of tags are used to indicate that a word or phrase is associated with a particular regional language variant within Japan, e.g. "ksb" (Kansai-ben). [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_dial (Full list) ]<br />
<br />
== OTHER LANGUAGES ==<br />
<br />
The JMdict file has the capacity to record glosses for Japanese headwords in many languages. JMdict is currently distributed in two versions: a basic version in which there are only English glosses, and a full version in which there are glosses included in German (133,000 entries), Russian (80,000), Hungarian (51,000), Spanish (39,000), Italian (38,000), Dutch (29,000), Swedish (16,000), French (15,000) and Slovenian (9,000). Details of the dictionary files used for the non-English glosses in JMdict can be found in the [http://www.edrdg.org/wwwjdic/wwwjdicinf.html#dicfilf_tag WWWJDIC documentation].<br />
<br />
As part of the daily build of the full JMdict file, the Japanese headwords are matched against the dictionary files for the other languages, and glosses are included where there is a match. The non-English glosses are added as separate sets of senses, and as far as possible are broken into individual senses using tags within those files (typically (1) .... (2) ....., etc.) At present there is no attempt to align senses between the languages as there is no consistency between the dictionaries as to the sense splitting. (There is some [[more information]] on the background to the current sense breakup.)<br />
<br />
== ROMAJI VERSIONS? ==<br />
<br />
None of the files in the JMdict/EDICT project use ローマ字 (romanized Japanese), except for proper names such as "Suzuki", "Fuji", etc. or in cases such as "ikebana" where the the romanized Japanese has been adopted as an English term.See the [[Editorial_policy#Romanized_Japanese|Editorial Policy]] for more information on this.<br />
<br />
== RELATED PROJECTS ==<br />
<br />
A number of other Japanese dictionary projects are closely related to this one. Among them are:<br />
<br />
* the [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] Japanese Proper Names Dictionary project, which currently has nearly 740,000 named entities. The files are available in EDICT or XML formats.<br />
* the [[KANJIDIC_Project| KANJIDIC]] project, which maintains and distributes databases of information about kanji.<br />
* the [http://www.edrdg.org/jmdict/compdic_doc.html COMPDIC] file in EDICT format of computing and telecomms terminology. In 2008 the COMPDIC material was included in the main EDICT/JMdict database with tagging indication the entries relate to ICT. A separate "COMPDIC" file is extracted for distribution.<br />
* the [http://www.edrdg.org/krad/kradinf.html RADKFILE/KRADFILE] file of visual elements in kanji, which can be used for finding kanji in dictionaries.<br />
<br />
== SERVERS & PACKAGES ==<br />
<br />
A large number of [[JMdictEDICT_software|WWW servers and software packages]] use the JMdict/EDICT files. <br />
== ACKNOWLEDGEMENTS ==<br />
<br />
Since 1991 a large number of people have contributed to this project; far too many to list here. All their contributions have been most welcome, indeed without the assistance of speakers and students of Japanese this project would not have achieved as much.<br />
<br />
The EDICT/JMdict has been granted approval to use material from the [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]. This approval is most welcome.<br />
<br />
== PUBLICATIONS ==<br />
<br />
Some publications by Jim Breen about the EDICT/JMdict project. A more complete and up-to-date list can be found in [http://www.edrdg.org/~jwb/papers.html Jim's publications page].<br />
<br />
* paper about JMdict presented at the COLING Multilingual Linguistic Resources Workshop in Geneva in August 2004. [http://www.edrdg.org/~jwb/paperdir/jmdictart.html (html)] [http://www.edrdg.org/~jwb/paperdir/jmdictart.pdf (pdf)]<br />''(This paper should be referenced when citing the dictionary in a publication.)''<br />
* an earlier [http://www.edrdg.org/~jwb/paperdir/ws2002_paper.html JMdict paper] about some of the practical issues, presented at the Papillon Project workshop in Tokyo in July 2002.<br />
* a paper presented to the Papillon Project workshop in 2003 in Sapporo on the [http://www.edrdg.org/~jwb/paperdir/dicexamples.html linking of examples sentences in the Tanaka corpus to EDICT entries in WWWJDIC].<br />
* a 1999 workshop paper about WWWJDIC; [http://www.edrdg.org/~jwb/paperdir/wwwjdic_article2.html (updated 2003 version)] [http://nihongo.monash.edu/wwwjdic_article/wwwjdic_article.html (1999 version)].<br />
* an overview paper about EDICT presented at the JSAA conference in 1995; [http://www.edrdg.org/~jwb/paperdir/hpaper.html (html)]<br />
* An early technical report from 1993; [http://www.edrdg.org/~jwb/paperdir/ejdic_report1.pdf (pdf)]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdict-EDICT_Dictionary_Project&diff=937JMdict-EDICT Dictionary Project2021-10-25T22:40:27Z<p>JimBreen: /* CURRENT VERSION &amp; DOWNLOAD */</p>
<hr />
<div>= JMdict/EDICT JAPANESE/ENGLISH DICTIONARY PROJECT =<br />
<br />
== INTRODUCTION ==<br />
<br />
The JMdict/EDICT project has as its goal the production of a comprehensive freely-available Japanese/English Dictionary database in machine-readable form which can be used by a variety of applications and servers.<br />
<br />
The project began in 1991 with the expansion of the EDICT simple Japanese-English dictionary file. (See below under History)<br />
<br />
At present the project has the following dictionary files available:<br />
<br />
* the full Japanese-Multilingual Dictionary (JMdict) file which is distributed in XML format. The JMdict file is aimed at being a multilingual lexical database with Japanese as the pivot language and also includes translations of words and phrases in a number of languages other than English. It has been designed to support the requirements of Japanese lexicography, including multiple surface forms, orthographical variants, okurigana variants, multile readings, etc.<br />
* the EDICT2 file, which is in a relatively simple one-line-per-entry text format based on the original EDICT format, and which contains almost all the information in the JMdict edition;<br />
* the EDICT file, which follows the original format of one kanji form and reading per entry, and contains a reduced amount of information. It is provided to maintain support for software which uses the original EDICT file format;<br />
* the EDICT_SUB file, which contains about 20% of the most common entries in the EDICT file.<br />
<br />
The dictionary data is maintained in an online database under the oversight of an editorial board, and the JMdict and EDICT versions are generated and released daily.<br />
<br />
The dictionary files are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group] who are the owners of the copyright.<br />
<br />
== CURRENT VERSION &amp; DOWNLOAD ==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file. (Only to be used in legacy apps, etc.)<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
== PROJECT FORUM ==<br />
<br />
The are several forums where this project is actively discussed.<br />
<br />
The original forum was the <tt> sci.lang.japan</tt> [http://groups.google.com/group/sci.lang.japan Usenet newsgroup. ] More recently a [http://groups.yahoo.com/group/edict-jmdict/ mailing list ] specifically for project discussion has begun. (Mail to <tt> edict-jmdict-subscribe@yahoogroups.com</tt> to initiate subscription.)<br />
<br />
== Next Generation ==<br />
<br />
A [[JMdict:_Next_Generation|major revision]] of the JMdict structure is planned as a way of dealing with a number of issues which have emerged during the life of the project.<br />
<br />
== DATABASE and UPDATING ==<br />
<br />
The dictionary data is all held in a PostgreSQL database and maintained using the [http://www.edrdg.org/wiki/index.php/JMdictDB_Project JMdictDB online system]. The JMdict version is generated directly from the database. From this the EDICT/EDICT2 versions are generated using utility software. You can explore the database and propose edits and new entries via its [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search Form].<br />
<br />
The [http://www.edrdg.org/wiki/index.php/Main_Page#The_JMdict.2FEDICT_Project EDRDG Wiki] has a wealth of information about the dictionary database, including suggestions about [http://www.edrdg.org/wiki/index.php/JMdict:_Getting_Started getting started, ] the detailed [http://www.edrdg.org/wiki/index.php/Editorial_policy editorial policy and guidelines], etc. etc.<br />
<br />
== FORMAT ==<br />
<br />
The basic format of the entries in the dictionary files can be seen in detail by examining the [http://www.edrdg.org/jmdict/jmdict_dtd_h.html DTD] (Document Type Declaration) of the XML-format JMdict file. The DTD is heavily annotated with content and structural information.[http://www.edrdg.org/jmdict/dtd-jmdict.xml (download)]<br />
<br />
In summary, each dictionary entry is independent, although there may be cross-reference fields pointing to other entries. Each entry consists of<br />
<br />
* kanji elements, i.e. headwords containing at least one kanji character, plus associated tags indicating some status or characteristic of the headword. Where there are multiple headwords, they have been ordered according to frequency of usage, as far as this can be determined;<br />
* reading elements, containing either the reading in kana of the headword, or the headword itself in the case of headwords only in kana. The elements also include tags indicating some status or characteristics. As with the kanji headwords, where there are multiple readings they have been ordered according to frequency of usage, as far as this can be determined;<br />
* general coded information relating to the entry as a whole, such as original language, date-of-creation, etc.<br />
* sense elements, containing the translational equivalents or glosses of the headword(s). As Japanese is not highly polysemous, there is often only one sense. Associated with the sense elements is other coded data indicating the part-of-speech, field of application, miscellaneous information, etc. As with headwords and readings, the glosses are ordered with the most common appearing first.<br />
<br />
The format and coding of the distributed files is as follows:<br />
<br />
* the JMdict file contains the complete dictionary information in XML format as per the DTD. This file is in Unicode/ISO-10646 coding using UTF-8 encapsulation. [http://www.edrdg.org/jmdict/jmdict_sample.html (Sample Entry)]<br />
* the EDICT file is in a relatively simple format based on the text data file of the SKK input-method. Each entry is in the form:<br />
: KANJI [KANA] /(general information) gloss/gloss/.../<br />
:: or<br />
: KANA /(general information) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT format:<br />
:: 収集 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/<br />
:: (in addition to equivalent entries with the 蒐集, 拾集 and 収輯 kanji compounds.)<br />
: Where there are multiple senses, these are indicated by (1), (2), etc. before the first gloss in each sense. As this format only allows a single kanji headword and reading, entries are generated for each possible headword/reading combination. As the format restricts Japanese characters to the kanji and kana fields, any cross-reference data and other informational fields are omitted.<br />
:The EDICT file is distributed in JIS X 0208 coding in EUC-JP encapsulation;<br />
* the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:<br />
: KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../<br />
: The sample entry (linked above) appears as follows in the EDICT2 format:<br />
:: 収集(P);蒐集;拾集;収輯 [しゅうしゅう] /(n,vs) gathering up/collection/accumulation/(P)/<br />
: In addition, the EDICT2 has as its last field the sequence number of the entry. This matches the "ent_seq" entity value in the XML edition. The field has the format: EntLnnnnnnnnX. The EntL is a unique string to help identify the field. The "X", if present, indicates that an audio clip of the entry reading is available from the JapanesePod101.com site.<br />
: The EDICT2 file is distributed in JIS X 0208 and JIS X 0212 codings in EUC-JP encapsulation;<br />
* the EDICT_SUB file is in the same format as the EDICT file.<br />
<br />
None of the files have the entries in any particular order.<br />
<br />
== PROJECT HISTORY ==<br />
<br />
The project was begun in 1991 by [http://nihongo.monash.edu/ Jim Breen] when an early DOS-based Japanese word-processor (MOKE - Mark's Own Kanji Editor) was released, containing an initial small version of the EDICT file. This was progressively expanded and edited over the following years. In 1999 the EDICT file, which by this time contained about 60,000 entries, was converted into an expanded format and the first XML-format JMdict file released. From that point both JMdict and the EDICT2/EDICT versions have been generated from the same source data.<br />
<br />
The EDICT2 format was created in 2003, primarily for use with the [http://nihongo.monash.edu/cgi-bin/wwwjdic.cgi?1C WWWJDIC] dictionary server, however it is now also used by other servers and applications.<br />
<br />
The growth in entries in the file is largely due to the efforts of the many people who have contributed entries to it over the years and who have participated in the editorial role. The increase in entry numbers has slowed as the file has achieved coverage of a large proportion of the Japanese lexicon. Much of the editorial work in recent years has concentrated on amendments and expansion to existing entries.<br />
<br />
A more expanded explanation of the early developments in the EDICT file can be found in the [http://www.edrdg.org/jmdict/edict_doc_old.html original documentation].<br />
<br />
== COPYRIGHT ==<br />
<br />
Dictionary copyright is a difficult point, because clearly the first lexicographer who published "inu means dog" could not claim a copyright violation over all subsequent Japanese dictionaries. While it is usual to consult other dictionaries for "accurate lexicographic information", as Nelson put it, wholesale copying is, of course, not permissible, and contributors have been advised to avoid direct copying from other sources. What makes each dictionary unique (and copyright-able) is the particular selection of words, the phrasing of the meanings, the presentation of the contents (a very important point in the case of this project), and the means of publication.<br />
<br />
The files of the project are copyright, and distributed in accordance with the Licence Statement, which can found at the WWW site of the [http://www.edrdg.org/ Electronic Dictionary Research and Development Group ] who are the current owners of the copyright. As explained in the licence, the files are available for use for most purposes provided acknowledgement and distribution of the documentation is made.<br />
<br />
== LEXICOGRAPHICAL DETAILS ==<br />
<br />
===Inflections, etc.===<br />
In general no inflections of verbs or adjectives have been included, except in idiomatic expressions. Adverbs formed from adjectives (e.g., -ku or -ni) are generally not included. Verbs are, of course, in the plain or "dictionary" form.<br />Composed forms, such as adverbs taking the "to" particle, keiyoudoushi adjectives, etc. are only included in their root from, however the part-of-speech (POS) marker is used to indicate their status. <br />Nouns which can form a verb with the auxiliary verb "suru" only appear in their noun form, but have a POS marker: "vs", to indicate the existence of a verbal form. In general the gloss only relates to the noun itself, but entries are being progressively expanded to include the verbal glosses as well.<br />
===Part of Speech Marking===<br />
The dictionary includes one or more Part of Speech (POS) markings on almost every entry. Examples include: "adj-i" (adjective - 形容詞), "n" (noun - 名詞), "prt" (particle - 助詞), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos (Full POS list)]<br />
===Field of Application===<br />
A number of entries are marked with a specific field of application, e.g. "chem" (chemistry), "math" (mathematics), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld (Full field list)]<br />
===Miscellaneous Markings===<br />
A number of miscellaneous tags are included in entries to provide additional information is a standardized form, e.g. "col" (colloquialism), "sl" (slang), "uk" (term usually in kana), etc. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_misc (Full list) ]<br />
===Word Priority Marking===<br />
The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:<br />
* news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2".<br />
* ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)<br />
* spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.<br />
* gai1/2: common loanwords, also based on the wordfreq file.<br />
* nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on. Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.While the priority markings accurately reflect the status of entries with regard to the various sources, they must be seen as only providing a crude indication of how common a word or expression actually is in Japanese. The "(P)" markings in the EDICT and EDICT2 files appear to identify a useful subset of "common" words, but there are clearly some marked entries which are not very common, and there are clearly unmarked entries which are in common use, particularly in the spoken language.<br />
===Okurigana Variants===<br />
Okurigana variants in headwords are handled by including each variant form as a headword. This is to enable software to match with variant forms.<br />
===Spellings===<br />
As far as possible variants of English translation and spelling are included. Where appropriate different translations are included for national variants (e.g. autumn/fall, tap/faucet, etc.). Common spelling variations such as -our/-or and -ize/-ise are handled either by repeating the gloss in both spellings or appending spelling variants in parentheses. No attempt is made to tag English spellings according to country of usage.<br />
===Loanwords and Regional Words===<br />
For loanwords (gairaigo) which have not been derived from English words, the source language and the word in that language are included. Languages have been coded in the three-letter codes from the ISO 639-2:1998 "Codes for the representation of names of languages" standard, e.g. "(fre: avec)" in the EDICT/EDICT2 files and <lsource xml:lang="fre">avec</lsource> in the JMdict file. [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_lang (Full list ] of language tags)In the case of gairaigo which have a meaning which is not apparent from the original (usually English) words, the words in the source language are included as: "lang: original words", e.g.<br />
: コンクール /(n) competition (fre: concours)/contest/ <br />
In some cases the entries are pseudo-loanwords that have been constructed in Japan from foreign (usually English) words or word fragments (e.g. 和製英語 - waseieigo). These are tagged with "wasei" in EDICT/EDICT2 entries, e.g.<br />
: アゲンストウィンド /(n) head wind (wasei: against wind)/adverse wind/ <br />
and in JMdict with the "ls_wasei" attribute e.g. <lsource ls_wasei="y">against wind</lsource>A number of tags are used to indicate that a word or phrase is associated with a particular regional language variant within Japan, e.g. "ksb" (Kansai-ben). [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_dial (Full list) ]<br />
<br />
== OTHER LANGUAGES ==<br />
<br />
The JMdict file has the capacity to record glosses for Japanese headwords in many languages. JMdict is currently distributed in two versions: a basic version in which there are only English glosses, and a full version in which there are glosses included in German (133,000 entries), Russian (80,000), Hungarian (51,000), Spanish (39,000), Italian (38,000), Dutch (29,000), Swedish (16,000), French (15,000) and Slovenian (9,000). Details of the dictionary files used for the non-English glosses in JMdict can be found in the [http://www.edrdg.org/wwwjdic/wwwjdicinf.html#dicfilf_tag WWWJDIC documentation].<br />
<br />
As part of the daily build of the full JMdict file, the Japanese headwords are matched against the dictionary files for the other languages, and glosses are included where there is a match. The non-English glosses are added as separate sets of senses, and as far as possible are broken into individual senses using tags within those files (typically (1) .... (2) ....., etc.) At present there is no attempt to align senses between the languages as there is no consistency between the dictionaries as to the sense splitting. (There is some [[more information]] on the background to the current sense breakup.)<br />
<br />
== ROMAJI VERSIONS? ==<br />
<br />
None of the files in the JMdict/EDICT project use ローマ字 (romanized Japanese), except for proper names such as "Suzuki", "Fuji", etc. or in cases such as "ikebana" where the the romanized Japanese has been adopted as an English term.See the [[Editorial_policy#Romanized_Japanese|Editorial Policy]] for more information on this.<br />
<br />
== RELATED PROJECTS ==<br />
<br />
A number of other Japanese dictionary projects are closely related to this one. Among them are:<br />
<br />
* the [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] Japanese Proper Names Dictionary project, which currently has nearly 740,000 named entities. The files are available in EDICT or XML formats.<br />
* the [[KANJIDIC_Project| KANJIDIC]] project, which maintains and distributes databases of information about kanji.<br />
* the [http://www.edrdg.org/jmdict/compdic_doc.html COMPDIC] file in EDICT format of computing and telecomms terminology. In 2008 the COMPDIC material was included in the main EDICT/JMdict database with tagging indication the entries relate to ICT. A separate "COMPDIC" file is extracted for distribution.<br />
* the [http://www.edrdg.org/krad/kradinf.html RADKFILE/KRADFILE] file of visual elements in kanji, which can be used for finding kanji in dictionaries.<br />
<br />
== SERVERS & PACKAGES ==<br />
<br />
A large number of [[JMdictEDICT_software|WWW servers and software packages]] use the JMdict/EDICT files. <br />
== ACKNOWLEDGEMENTS ==<br />
<br />
Since 1991 a large number of people have contributed to this project; far too many to list here. All their contributions have been most welcome, indeed without the assistance of speakers and students of Japanese this project would not have achieved as much.<br />
<br />
The EDICT/JMdict has been granted approval to use material from the [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]. This approval is most welcome.<br />
<br />
== PUBLICATIONS ==<br />
<br />
Some publications by Jim Breen about the EDICT/JMdict project. A more complete and up-to-date list can be found in [http://www.edrdg.org/~jwb/papers.html Jim's publications page].<br />
<br />
* paper about JMdict presented at the COLING Multilingual Linguistic Resources Workshop in Geneva in August 2004. [http://www.edrdg.org/~jwb/paperdir/jmdictart.html (html)] [http://www.edrdg.org/~jwb/paperdir/jmdictart.pdf (pdf)]<br />''(This paper should be referenced when citing the dictionary in a publication.)''<br />
* an earlier [http://www.edrdg.org/~jwb/paperdir/ws2002_paper.html JMdict paper] about some of the practical issues, presented at the Papillon Project workshop in Tokyo in July 2002.<br />
* a paper presented to the Papillon Project workshop in 2003 in Sapporo on the [http://www.edrdg.org/~jwb/paperdir/dicexamples.html linking of examples sentences in the Tanaka corpus to EDICT entries in WWWJDIC].<br />
* a 1999 workshop paper about WWWJDIC; [http://www.edrdg.org/~jwb/paperdir/wwwjdic_article2.html (updated 2003 version)] [http://nihongo.monash.edu/wwwjdic_article/wwwjdic_article.html (1999 version)].<br />
* an overview paper about EDICT presented at the JSAA conference in 1995; [http://www.edrdg.org/~jwb/paperdir/hpaper.html (html)]<br />
* An early technical report from 1993; [http://www.edrdg.org/~jwb/paperdir/ejdic_report1.pdf (pdf)]</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=936Editorial policy2021-10-11T03:58:50Z<p>JimBreen: </p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots that are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead, create a separate entry and create cross-references between them. Similarly, if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
====Reading Field Simplification====<br />
<br />
Until mid-2021, if the kanji field of an entry included both kanji and katakana for part of a form, e.g. アカバナ科 and 赤花科, then the reading field typically had matching kana forms, in this case アカバナか and あかばなか, with restrictions to align the kanji/readings pairs. This was done to assist with the generation of the legacy EDICT format. This is no longer a major issue and it is now considered acceptable to have a single reading (i.e. あかばなか) in such cases.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=935Editorial policy2021-09-17T04:44:16Z<p>JimBreen: /* Before Starting */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. The counts from the [https://www.edrdg.org/~jwb/ngramcounts.html Google WWW n-gram corpus] can assist here. (Counts from Google searches tend not to be very reliable.)<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=934Editorial policy2021-08-29T02:12:11Z<p>JimBreen: /* Kanji/Special-Character Forms */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. Provided the frequencies are equivalent, give preference to terms using 常用漢字. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=933Editorial policy2021-08-29T02:08:47Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji and are not necessarily common or topical. Many of the examples in the Nelson dictionaries are quite old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=932Editorial policy2021-08-29T02:07:27Z<p>JimBreen: /* References */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
*NN, CN, S&H, Halpern: the New Nelson, Classic Nelson, Spahn & Hadamitzky (The Kanji Dictionary), Jack Halpern (New Japanese-English Character Dictionary). These are the main 漢和字典 with example words with English meanings. Note that these are included as <b>examples</b> of the use of kanji are are not necessarily common or topical. Many of the words in the Nelson dictionaries are very old and are not really suitable as dictionary entries.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=931Main Page2021-08-25T22:57:49Z<p>JimBreen: /* The KRADFILE/RADKFILE Project */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.)<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software that provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files. The files can be downloaded - use the links in that page.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=930KANJIDIC Project2021-08-17T08:51:01Z<p>JimBreen: /* Radicals */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1130 Kanji);<br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to be more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V3.0). See the [http://www.edrdg.org/edrdg/licence.html EDRDG General Doctionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=KANJIDIC_Project&diff=929KANJIDIC Project2021-08-07T00:41:11Z<p>JimBreen: /* Radicals */</p>
<hr />
<div>=The KANJIDIC Project=<br />
<br />
''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)''<br />
<br />
==Introduction==<br />
<br />
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.<br />
<br />
Three data files are distributed by this project:<br />
* the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. ([http://www.edrdg.org/kanjidic/kanjidic2.xml.gz download])<br />
* the KANJIDIC file, which in in [https://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-JP EUC-JP] coding and covers the 6,355 kanji in JIS X 0208. ([http://www.edrdg.org/kanjidic/kanjidic.gz download])<br />
* the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. ([http://www.edrdg.org/kanjidic/kanjd212.gz download])<br />
<br />
==Content & Format==<br />
The database and distributed data files contain an entry for each of the kanji, with each entry containing a number of fields of data about the kanji. The data is described in the following table. The format of the distributed files as as follows:<br />
* the KANJIDIC and KANJD212 files are text files with one line per kanji and the information fields separated by spaces. The format of each line is:<br />
** the kanji itself followed by the hexadecimal form of the JIS ''ku-ten'' coding, e.g. "亜 3021" (the decimal ''ku-ten'' code is 16-01);<br />
** information fields beginning with one or two-letter codes as per the table below. For example "S10" indicates a stroke count of 10;<br />
** the Japanese readings of the kanji. ON readings (音読み) are generally in ''katakana'' and KUN readings (訓読み) in ''hiragana''. An exception is the set of ''kokuji'' for measurements such as centimetres, where the reading is in ''katakana''. Hyphens are used to indicate prefixes/suffixes, and '.' indicates the portion of the reading that is ''okurigana''. There may be several classes of reading fields, with ordinary readings first, followed by members of the other classes, if any. The current other classes, and their tagging, are:<br />
***where the kanji has special ''nanori'' (i.e. name) readings, these are preceded the marker "T1";<br />
***where the kanji is a radical, and the radical name is not already a reading, the radical name is preceded the marker "T2".<br />
** the meanings (usually in English). Each field begins with an open brace '{' and ends at the next close brace '}'.<br />
* the KANJIDIC2 file is in XML and is structured according to its [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] (Document Type Definition). The DTD contains extensive annotations and is intended to be the primary documentation for the file. This [http://www.edrdg.org/kanjidic/kd2examph.html sample] illustrates the structure of a typical entry. Information fields are grouped by type within entities such as <dic_number> and <query_code>, with specific values indicated by an attribute code. For example the kanji 亜 has the number 43 in the original Nelson kanji dictionary and 81 in the New Nelson. This is recorded in the XML file as:<br/><br />
:<dic_number><br />
::<dic_ref dr_type="nelson_c">43</dic_ref><br />
::<dic_ref dr_type="nelson_n">81</dic_ref><br />
:: ....<br />
:</dic_number><br />
<br />
{| class="wikitable"<br />
|+ Kanjidic Information Fields<br />
|-<br />
! Field<br />
! Kanjidic Code<br/>(if any)<br />
! Group Entity<br />
! Entity plus Attribute(s)<br/>(if any)<br />
! Comment<br />
|-<br />
| Kanji<br />
| none<br />
| literal<br />
| <br />
|<br />
|-<br />
| JIS code-point<br />
| none<br />
| codepoint<br />
| cp_value cp_type="jis208" (or "jis212" or "jis213")<br />
| e.g. 亜 is "3021" in KANJIDIC and<br/>"1-16-01" in KANJIDIC2<br />
|-<br />
| Unicode code-point<br />
| U<br />
|codepoint<br />
| cp_value cp_type="ucs"<br />
| <br />
|-<br />
| Radical (Classical) (See Note 1 below)<br />
| B/C<br />
| radical<br />
| rad_value rad_type="classical"<br />
| Where Nelson uses the classical radical this has a "B" code, otherwise it has a "C" code<br />
|-<br />
| Radical (Nelson)<br />
| B<br />
| radical<br />
| rad_value rad_type="nelson_c"<br />
| <br />
|-<br />
| Grade<br />
| G<br />
| misc<br />
| grade<br />
| The "grade" of the kanji. <br/>- G1 to G6 indicates the grade level as specified by the Japanese Ministry of Education for kanji that are to be taught in elementary school (1026 Kanji). These are sometimes called the ''kyōiku'' (education) kanji and are part of the set of ''jōyō'' (daily use) kanji;<br/>- G8 indicates the remaining ''jōyō'' kanji that are to be taught in secondary school (additional 1130 Kanji);<br/>- G9 and G10 indicate ''jinmeiyō'' ("for use in names") kanji which in addition to the ''jōyō'' kanji are approved for use in family name registers and other official documents. G9 (649 kanji, of which 640 are in KANJIDIC) indicates the kanji is a "regular" name kanji, and G10 (212 kanji of which 130 are in KANJIDIC) indicates the kanji is a variant of a ''jōyō'' kanji.<br />
|-<br />
| Stroke count<br />
| S<br />
| misc<br />
| stroke_count<br />
| The stroke count of the kanji. If more than one, the first is considered the accepted count, while subsequent ones are common miscounts. (See the section later in this document on counting strokes for some of the rules applied especially to radicals.)<br />
|-<br />
| Frequency-of-use ranking<br />
| F<br />
| misc<br />
| freq<br />
| The 2,501 most-used characters have a ranking which expresses the relative frequency of occurrence of a character in modern Japanese. The data is based on an analysis of word frequencies in the Mainichi Shimbun over 4 years by Alexandre Girardi. Note: (a) these frequencies are biased towards words and kanji used in newspaper articles, and (b) the relative frequencies for the last few hundred kanji so graded is quite imprecise.<br />
|-<br />
| Variant JIS 0208 kanji<br />
| XJ0<br />
| misc<br />
| variant var_type="jis208"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0212 kanji<br />
| XJ1<br />
| misc<br />
| variant var_type="jis212"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the JIS hex code is used and in the XML file the equivalent "1-nn-nn" kuten code is used.)<br />
|-<br />
| Variant JIS 0213 kanji<br />
| XJ2<br />
| misc<br />
| variant var_type="jis213"<br />
| Code-point of a similar or related kanji. (In the kanjidic file the plane number (P: 1 or 2) plus the JIS hex code is used, and in the XML file the equivalent "P-nn-nn" kuten code is used.)<br />
|-<br />
| Variant kanji (De Roo index)<br />
| XJD<br />
| misc<br />
| variant var_type="deroo"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (NJECD index)<br />
| XH<br />
| misc<br />
| variant var_type="halpern_njecd"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (S&H index)<br />
| XI<br />
| misc<br />
| variant var_type="s_h"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (Nelson index)<br />
| XN<br />
| misc<br />
| variant var_type="nelson_c"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Variant kanji (O'Neill index)<br />
| XO<br />
| misc<br />
| variant var_type="oneill"<br />
| Code-point of a similar or related kanji.<br />
|-<br />
| Radical name(s)<br />
| none<br />
| misc<br />
| rad_name<br />
| The name of the radical in ''hiragana''. In the KANJIDIC edition these are placed after the readings and preceded by the "T2" tag.<br />
|-<br />
| JLPT Level<br />
| J<br />
| misc<br />
| jlpt<br />
| The pre-2010 level of the Japanese Language Proficiency Test (JLPT) in which the kanji occurs (1-4). Note that the JLPT test levels changed in 2010, with a new 5-level system (N1 to N5) being introduced. No official kanji lists are available for the new levels. The new levels are regarded as being similar to the old levels except that the old level 2 is now divided between N2 and N3, and the old levels 3 and 4 are now N4 and N5. <br />
|-<br />
| Nelson (Classic) number<br />
| N<br />
| dic_number<br />
| dic_ref dr_type="nelson_c"<br />
| The index number in "The Modern Reader's Japanese-English Character Dictionary", edited by Andrew Nelson. If not present, the character is not in Nelson, or is considered to be a non-standard version, in which case it may have a variant. Note that many kanji glyphs currently used are what Nelson described as "non-standard".<br />
|-<br />
| Nelson (New) number<br />
| V<br />
| dic_number<br />
| dic_ref dr_type="nelson_n"<br />
| The index number in "The New Nelson Japanese-English Character Dictionary", edited by John Haig.<br />
|-<br />
| NJECD number<br />
| H<br />
| dic_number<br />
| dic_ref dr_type="halpern_njecd"<br />
| The index number in the "New Japanese-English Character Dictionary" (1990), edited by Jack Halpern.<br />
|-<br />
| Kodansha Kanji Dictionary number<br />
| DP<br />
| dic_number<br />
| dic_ref dr_type="halpern_kkd"<br />
| The index numbers used by Jack Halpern in the "Kodansha Kanji Dictionary" (2013), which is the revised version of the "New Japanese-English Kanji Dictionary" of 1990.<br />
|-<br />
|Kanji Learners Dictionary number<br />
|DK<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld"<br />
|The index numbers used by Jack Halpern in the "Kanji Learners Dictionary", published by Kodansha in 1999.<br />
|-<br />
|Kanji Learners Dictionary number (2nd ed)<br />
|DL<br />
|dic_number <br />
|dic_ref dr_type="halpern_kkld_2ed"<br />
|The index numbers used by Jack Halpern in the 2nd edition of the "Kanji Learners Dictionary", published by Kodansha in 2013. <br />
|-<br />
|Remembering The Kanji number<br />
|L<br />
|dic_number <br />
|dic_ref dr_type="heisig"<br />
|The index number used in "Remembering The Kanji" by James Heisig.<br />
|-<br />
|Remembering The Kanji number (6th ed)<br />
|DN<br />
|dic_number <br />
|dic_ref dr_type="heisig6"<br />
|The index number used in "Remembering The Kanji, 6th Edition" by James Heisig. <br />
|-<br />
|Gakken number<br />
|K<br />
|dic_number <br />
|dic_ref dr_type="gakken"<br />
|The index number in the Gakken Kanji Dictionary ("A New Dictionary of Kanji Usage"). Some of the numbers relate to the list at the back of the book, jouyou kanji not contained in the dictionary, and various historical tables at the end.<br />
|-<br />
|O'Neill's Japanese Names number<br />
|O<br />
|dic_number <br />
|dic_ref dr_type="oneill_names"<br />
|The index number in "Japanese Names", by P.G. O'Neill. (Weatherhill, 1972) (Note: some of the numbers end with 'A'.)<br />
|-<br />
|O'Neill's Essential Kanji number<br />
|DO<br />
|dic_number <br />
|dic_ref dr_type="oneill_kk"<br />
|The index numbers used in P.G. O'Neill's "Essential Kanji".<br />
|-<br />
|Morohashi number<br />
|MN/MP<br />
|dic_number <br />
|dic_ref dr_type="moro" m_vol m_page<br />
|The index number and volume.page respectively of the kanji in the 13-volume Morohashi Daikanwajiten. A terminal `P` in the number, e.g. 4879P, indicates that it is 4879' in the original. In some 500 cases, the number is terminated with an `X`, to indicate that the kanji in Morohashi has a close, but not identical, glyph to the form in the JIS X 0208 standard.<br/>In the XML the volume and page are attribute values.<br />
|-<br />
|Henshall number<br />
|E<br />
|dic_number <br />
|dic_ref dr_type="henshall"<br />
|The index number used in "A Guide To Remembering Japanese Characters" by Kenneth G. Henshall.<br />
|-<br />
|Kanji & Kana number<br />
|IN<br />
|dic_number <br />
|dic_ref dr_type="sh_kk"<br />
|The index number used in Spahn & Hadamitzky's "Kanji & Kana", 2nd edition (Tuttle).<br />
|-<br />
|Kanji & Kana number (2011 ed)<br />
|DA<br />
|dic_number <br />
|dic_ref dr_type="sh_kk2"<br />
|The index number used in 2011 edition of Spahn & Hadamitzky's "Kanji & Kana".<br />
|-<br />
|Sakade number<br />
|DS<br />
|dic_number <br />
|dic_ref dr_type="sakade"<br />
|The index numbers used in the early editions of "A Guide To Reading and Writing Japanese", edited by Florence Sakade.<br />
|-<br />
|Japanese Kanji Flashcards number<br />
|DF<br />
|dic_number <br />
|dic_ref dr_type="jf_cards"<br />
|The index numbers used in the "Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki (White Rabbit Press). <br />
|-<br />
|Henshall Guide number<br />
|DH<br />
|dic_number <br />
|dic_ref dr_type="henshall3"<br />
|The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Ken Henshall et al. <br />
|-<br />
|Tuttle Kanji Cards number<br />
|DT<br />
|dic_number <br />
|dic_ref dr_type="tutt_cards"<br />
|The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.<br />
|-<br />
|Crowley number<br />
|DC<br />
|dic_number <br />
|dic_ref dr_type="crowley"<br />
|The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley. <br />
|-<br />
|Kanji in Context number<br />
|DJ<br />
|dic_number <br />
|dic_ref dr_type="kanji_in_context"<br />
|The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.<br />
|-<br />
|Kodansha Compact Kanji Guide number<br />
|DG<br />
|dic_number <br />
|dic_ref dr_type="kodansha_compact"<br />
|The index numbers used in the "Kodansha Compact Kanji Guide".<br />
|-<br />
|Japanese For Busy People number<br />
|DB<br />
|dic_number <br />
|dic_ref dr_type="busy_people"<br />
|The index numbers used in "Japanese For Busy People" vols I-III, published by the AJLT. The codes are the volume.chapter. <br />
|-<br />
|Maniette number<br />
|DM<br />
|dic_number <br />
|dic_ref dr_type="maniette"<br />
|The numbers in Yves Maniette's "Les Kanjis dans la tête", the French adaptation of Heisig's "Remembering The Kanji".<br />
|-<br />
|SKIP code<br />
|P<br />
|query_code <br />
|q_code qc_type="skip"<br />
|The SKIP (System of Kanji Indexing by Patterns) developed by Jack Halpern. The code is of the form "l-m-n". See [[#SKIP_Codes|SKIP Codes]] section for more information.<br />
|-<br />
|S&H descriptor<br />
|I<br />
|query_code <br />
|q_code qc_type="sh_desc"<br />
|The index code in "The Kanji Dictionary" (Tuttle 1996), by Spahn & Hadamitzky. It is the form nxnn.n, e.g. 3k11.2, where the kanji has 3 strokes in the identifying radical, it is radical "k" in the S&H classification system, there are 11 other strokes, and it is the 2nd kanji in the 3k11 sequence. <br />
|-<br />
|Four Corner code<br />
|Q<br />
|query_code <br />
|q_code qc_type="four_corner"<br />
|The Four Corner code for the kanji. See the [[#Four_Corner_Codes|Four Corner codes]] section for more information.<br />
|-<br />
|De Roo code<br />
|DR<br />
|query_code <br />
|q_code qc_type="deroo"<br />
|The codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). See the [[#De_Roo_Codes|De Roo Codes]] section for more information.<br />
|-<br />
|Misclassification code<br />
|ZPP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="posn"<br />
|SKIP misclassification by position.<br />
|-<br />
|Misclassification code<br />
|ZSP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_count"<br />
|SKIP misclassification by stroke count.<br />
|-<br />
|Misclassification code<br />
|ZBP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_and_posn"<br />
|SKIP misclassification by both position and stroke count.<br />
|-<br />
|Misclassification code<br />
|ZRP<br />
|query_code <br />
|q_code qc_type="skip" skip_misclass="stroke_diff"<br />
|SKIP misclassification by differing opinions on stroke counts.<br />
|-<br />
|Chinese reading<br />
|Y<br />
|rmgroup<br />
|reading r_type="pinyin"<br />
|The PinYin (Chinese) reading of the kanji.<br />
|-<br />
|Korean reading (romanized)<br />
|W<br />
|rmgroup<br />
|reading r_type="korean_r"<br />
|The Korean reading of the kanji in the (Republic of Korea) Ministry of Education style.<br />
|-<br />
|Korean reading (hangul)<br />
|not included<br />
|rmgroup<br />
|reading r_type="korean_h"<br />
|The Korean reading of the kanji in the hangul script.<br />
|-<br />
|Vietnamese reading (chữ quốc ngữ)<br />
|not included<br />
|rmgroup<br />
|reading r_type="vietnam"<br />
|The Vietnamese reading of the kanji in chữ quốc ngữ.<br />
|-<br />
|Japanese on reading (''katakana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_on"<br />
| In the KANJIDIC edition the readings are placed between the information fields and the meanings.<br />
|-<br />
|Japanese kun reading (''usu. hiragana'')<br />
|none<br />
|rmgroup<br />
|reading r_type="ja_kun"<br />
|<br />
|-<br />
| Meanings<br />
| none<br />
| rmgroup<br />
| meaning m_lang="xx"<br />
| The kanji meaning(s). For languages other than English the m_lang attribute is used with two-letter ISO 639-1 language codes. In the KANJIDIC edition the meanings are placed at the end of the line.<br />
|-<br />
| Name reading(s) (''hiragana'')<br />
| T1<br />
| <br />
| nanori<br />
| The readings only associated with named-entities. In the KANJIDIC edition the first of these is preceded by the "T1" tag.<br />
|}<br />
Note 1: For the sake of consistency the classical radical is the one indicated in the JIS漢字字典 (日本規格協会).<br />
<br />
==Radical and Stroke Counting Rules==<br />
<br />
These rules apply to:<br />
#the stroke-counts themselves;<br />
#the stroke counts in the SKIP codes. Where this results in a SKIP which differs from that in the NJECD, or in the non-NJECD SKIPs provided by Jack Halpern, the Jack Halpern version is included prefixed with "ZR".<br />
===Radicals===<br />
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to more aligned to Halpern.<br />
#B54 ENNYOU - 廴. Traditionally counted as 3 strokes, but more recently often counted as 2. S&H count this as 2; Nelson, Halpern, Koujien, etc, count it is 3. I treat it as 3.<br />
#B97 URI - 瓜. Traditionally counted as 5 strokes, as the middle portion looks like a katakana ム. Modern glyphs invariably make it look like 6 strokes. Nelson says it is 5 strokes. Halpern does too, but then counts the shape as 6 in other kanji. Koujien says 6, as do S&H. I treat it as 6.<br />
#B113 SHIMESU e.g. 礼, is counted as 4 strokes in that form, and 5 strokes in its older form, 祀 [http://www.edrdg.org/~jwb/U7940old.png (image)]. 18 kanji are in the 4-stroke form and 20 are in the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4 or 5. [See Note 1.])<br />
#B131 SHIN/KERAI 臣. Counted as 7 (Nelson counts it as 6, Halpern as 7 (in the book), and S&H as both for different kanji.)<br />
#B136 MAI ASHI 舛. Counted as 7 (traditionally counted as 6, in accordance with the older writing of `ヰ'. Nelson counts as 6, S&H as 7, and Halpern as 7 for 常用 and 人名用漢字 and 6 for the rest.) Note this is also applied to counting 絳 and for kanji with the 韋 pattern.<br />
#B140 KUSA-KANMURI e.g. 苛 always counted as 3 strokes (Halpern counts this 4 strokes for the (mostly level 2) kanji where the older form is often printed.) Note that this has been carried through to kanji where this element is not the indexing radical, such as 朦.<br />
#B162 SHIN-NYUU e.g. 遙 or 逢 counted as 3 or 4 strokes. (Nelson and S&H count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]<br />
#B163 OOZATOZUKIRI & B170 KOZATO-HEN 邦 and 阡 always counted as 3 strokes (Nelson and S&H count it as 2, Halpern as 3.) This also applies where it appears mid-kanji, such as in 橢.<br />
#B184 SHOKU HEN 食, 飢, etc.is counted as 8 strokes in the 飢 form, and as 9 strokes in the 飭 and 餐 forms. (Nelson and S&H count it as 8 strokes, and Halpern as 8 or 9.) [See Note 1. below.]<br />
#B199 MUGI 麦 always counted as 7 strokes, except for 麥 & 麩 where it is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating it as a radical, but count it as 12 in the remainder.)<br />
#The ROO or OI radical (老) has a variant consisting of the top 4 strokes. For example, it is in 者. Traditionally, this variant had an extra dot, and was counted as 5 strokes. I'm counting it as 4 throughout.<br />
<br />
===Other Stroke Patterns===<br />
# While the pattern 臼 is a 6-stroke radical, the top half of 叟 is made up of three distinct parts totalling 8 strokes. Note that this also is the case with 嫂, 溲, 艘 and 痩 despite the simplification in the JIS glyphs.<br />
#牙 (KIBA HEN) is a problem. It is classically counted as 4 strokes, but these days has a flick that makes it effectively 5. Halpern, Nelson and S&H usually have it as 5 strokes, so I'm standardizing on that.<br />
#Another little horror is 旡 (MU or NASHI), which is classically counted as 4 strokes. The most common variant has 5 strokes, but looks like 6. Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New Nelson as 5. I'm making it 5 too.<br />
#The JUU or ASHIATO radical is at the bottom of 禽 and 禺. It is traditionally counted as 5 strokes, although sometimes it looks like 4. I'm using 5 throughout.<br />
#A related shape is ム, as in 瓜, 孤, 弧, etc. This is sometimes counted as two strokes (both Nelsons) and sometimes as three strokes (Halpern, S&H). Classically it is regarded as two strokes. I am using 6 strokes for 瓜.<br />
#The pattern to the left of 敝, which appears in several kanji, e.g. 幣 and 瞥, has 8 strokes. (There are 3 strokes at the top as in 尚.)<br />
#The "east" pattern (東) has 8 strokes. There is an older form in which there are two strokes in the box (柬). It is counted as 8 strokes here in the 東 form (e.g. 諌) and 9 in the 柬 form, as in 諫.<br />
#The pattern at the bottom of 雋 is counted as 4 strokes in modern dictionaries, although traditionally it was 5.<br />
#The pattern 巻, which appears in several kanji, is counted as 9 strokes. Several dictionaries count it as either 8 or 9.<br />
#The pattern on the left of 収 is variously handled as 2 strokes or 3 strokes. As more recent dictionaries make it 4, I will do so too.<br />
#The 攵 pattern has 3 and 4-stroke versions, and sometimes the glyphs can be confusing as to which is used. In the 緻 kanji, for example, it is traditionally counted as 3, but Spahn & Hadamitzky count it as 4 and the Nelsons include both.<br />
<br />
Note: The JIS X 0208-1990 standard does not formally specify the precise glyphs used for kanji, however the glyphs it uses in the published version have become de facto standards for many font compilations. In the published standard, for several kanji, e.g. 辿/迚, 礼/祀, 飢/飭, the JIS level one kanji use the simpler form, and the Level 2 kanji use the older more complex form. Just to make matters worse, many fonts for JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984 standard, which defines the 16x16 patterns for JIS X 0208-1983 characters. According to Ken Lunde: "This standard was not very good, and JSA is no longer supporting it." Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both Levels 1 and 2, as well as having simplifications of kanji like 濾. Thus, as the font foundries have freedom to choose whichever glyphs they like, what you see on your screen may well not agree with these rules. All the rules in this appendix relate to the glyphs as published in the JIS X 0208-1990 standard, and as appearing in font compilations based on them.<br />
<br />
==Kanji Dictionary Search Codes==<br />
<br />
===SKIP Codes===<br />
<br />
The System of Kanji Indexing by Patterns (SKIP) is a scheme for the classification and rapid retrieval of Chinese characters on the basis of geometrical patterns. Developed by Jack Halpern, it first appeared in the New Japanese-English Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and in successor publications such as the "Kanji Learners Dictionary" (Kodansha 1999,2011) and the "Kodansha Kanji Dictionary" (2013). A description of the coding system is [http://www.edrdg.org/wwwjdic/SKIP.html available].<br />
<br />
As examples, 割 has a SKIP code of 1-10-2, indicating it is divided into left-right portions with 10 strokes at the left and 2 at the right. 度 has a SKIP code of 度 indicating it has a 3-stroke enclosure with 6 strokes inside it.<br />
<br />
===De Roo Codes===<br />
<br />
The De Roo codes were developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha). They are based on the shapes observed at the top and bottom of the character. A [http://www.edrdg.org/wwwjdic/deroo.html detailed description] is available.<br />
<br />
As an example, 亜 has a code of 3273 indicating that the top of the kanji is pattern number 32 (兀) and the bottom pattern number 73 (horizontal line with two vertical strokes above it.<br />
<br />
===Four Corner Codes===<br />
The Four Corner coding system was invented by Wang Chen in 1928, it has since then been widely used in dictionaries in China and Japan for classifying kanji and hanzi. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.An [http://www.edrdg.org/wwwjdic/FOURCORNER.html overview] of the coding system is available.<br />
In some cases a character may have two of these codes, as it is can be little ambiguous, and Morohashi has some kanji coded differently from their traditional Chinese codes.<br />
The coding system indexes characters according to the shapes at the corners.<br />
<br />
==Proposing Changes==<br />
<br />
There is currently no online access to the database the holds the KANJIDIC contents (the information is mostly quite static.) Anyone wishing to propose a change to the data for a kanji, e.g. add or change a reading will need to email Jim Breen at jimbreen@gmail.com.<br />
<br />
==Kanji Information Sites==<br />
''(Being expanded)''<br />
* Jim's [http://nihongo.monash.edu/kanjiinfo.html Kanji Information Page].<br />
* The [https://kanjialive.com/ Kanji alive] site at the University of Chicago.<br />
* The [https://www.kanjipedia.jp/ Kanjipedia] sit (mostly in Japanese).<br />
<br />
==Legacy Documentation==<br />
<br />
The current Wiki page was compiled from several older documents, which are no longer being maintained. They are still available for historical purposes. They are:<br />
* a basic home page about [http://www.edrdg.org/kanjidic/kanjd2index_legacy.html KANJIDIC2];<br />
* an overview page about the [http://www.edrdg.org/kanjidic/kanjidic2_ov_legacy.html KANJIDIC2 structure];<br />
* an overview page about [http://www.edrdg.org/kanjidic/kanjidic_legacy.html KANJIDIC and KANJD212];<br />
* the original [http://www.edrdg.org/kanjidic/kanjidic_doc_legacy.html KANJIDIC] documentation;<br />
* the original [http://www.edrdg.org/kanjidic/kanjd212_doc_legacy.html KANJD212] documentation.<br />
<br />
==Copyright and Permissions==<br />
<br />
The KANJIDIC project files are released under a Creative Commons Attribution-ShareAlike Licence (V3.0). See the [http://www.edrdg.org/edrdg/licence.html EDRDG General Doctionary Licence Statement] for details.<br />
<br />
For the most part the information provided in the project's files is in the public domain. Information relating to the sequence numbers of kanji in published dictionaries is not considered to be subject to copyright. Descriptor and other search codes are considered to be the intellectusl policy of the developers. With regard to the codes included in the KANJIDIC files:<br />
* in 2014 the SKIP codes were placed by Jack Halpern under a under a CC-SA licence. See [http://www.kanji.org/kanji/dictionaries/skip_permission.htm this page] for his announcement. It is now under a [https://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported Licence].<br />
* Fr De Roo provided written permission for the De Roo codes to be included in KANJIDIC.<br />
* the Spahn and Hadamitzky descriptor codes were kindly supplied by Mark Spahn for inclusion in KANJIDIC.<br />
<br />
==History==<br />
<br />
''(some comments by Jim Breen)''<br />
<br />
KANJIDIC began around 1991 as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.<br />
<br />
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.<br />
<br />
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.<br />
<br />
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).<br />
<br />
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.<br />
<br />
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.<br />
<br />
Lee Collins provided the Unicode mappings.<br />
<br />
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.<br />
<br />
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.<br />
<br />
In March 1994 the Morohashi indices were proof-read and corrected by Christian.<br />
<br />
Alfredo Pinochet supplied all the Henshall numbers.<br />
<br />
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.<br />
<br />
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject (initially) to copyright and usage restrictions. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.<br />
<br />
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.<br />
<br />
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance. <br />
<br />
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.<br />
<br />
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller, then of of Toyo Gakuen University, to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.<br />
<br />
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.<br />
<br />
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.<br />
<br />
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.<br />
<br />
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".<br />
<br />
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.<br />
<br />
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.<br />
<br />
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)<br />
<br />
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.<br />
<br />
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath. <br />
<br />
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of the codes.<br />
<br />
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.<br />
<br />
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.<br />
<br />
I did the Tuttle card numbers myself.<br />
<br />
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power", Sakade's "A Guide To Reading and Writing Japanese", and also for that book's 3rd Edition edited by Henshall, Seeley & De Groot.<br />
<br />
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.<br />
<br />
The "Kanji in Context" codes were provided by Randy Foreman.<br />
<br />
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.<br />
<br />
Alain Thierion translated the meanings of the kanji into French, and also provided the Maniette numbers.<br />
<br />
Andrew Slater provided updates to the JLPT numbers, and additional numbers for the Japanese Flashcards series.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=JMdictEDICT_software&diff=928JMdictEDICT software2021-08-05T07:32:55Z<p>JimBreen: /* Software Packages and Servers using the JMdict and EDICT Files */</p>
<hr />
<div>==Software Packages and Servers using the JMdict and EDICT Files==<br />
<br />
''(This list is far from up-to-date or complete. It is currently under revision. Feel free to send Jim details of suggested changes.)''<br />
===JMdict===<br />
* the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= JMdictDB] database interface The full search page enables some very sophisticated searching of the JMdict database.<br />
* the very popular [https://jisho.org/ jisho.org] server by Kim Ahlström<br />
* Petteri Kettunen's [http://neko.homeunix.net/~petterik/tkjmdict.html tkjmdict]<br />
* Cory Nelson's [http://int64.org/gozoku.html Gozoku] (formerly JMDict#)<br />
* Jean Soulat's [http://www.smartkanji.net SmartKanji.net] free on-line multilingual dictionary with text parser. Also included: links to Wieger's etymological lessons and two dictionaries of Chinese/Japanese Buddhist terms.<br />
* [http://www.jedict.com/ JEDict] for Macs<br />
<br />
===EDICT===<br />
* SERVERS<br />
<br />
** [https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], Jim Breen's traditional and rather old-fasioned server. Now in its 23rd year of operations.<br />
** [http://jishobot.com Jishobot]<br />
** [http://rut.org/cgi-bin/j-e/dict Jeffrey's Server]<br />
** [http://www.rikai.com/perl/Home.pl Rikai.com]<br />
** [http://azalae.com/diku/ diku] - a very simple server<br />
** [http://www.glpwd.com/jtango-web/search.action jTango], another basic server<br />
** [http://dictionary.pspinc.com/ PSP's] basic server (old files used)<br />
** [http://www.foks.info/ FOKS] (Forgiving Online Kanji Search) - compensates for mistakenly guessed readings<br />
** nasty mangled romaji server at [http://www.freedict.com/onldict/jap.html freedict]<br />
** the [http://www.animelab.com/anime.manga/dictionary/ AnimeLab] server<br />
** [http://www.nihongoresources.com/ NihongoResources] server<br />
** Grzegorz Bober's [http://tangorin.com/ Tangorin] (uses the Tanaka Corpus too.)<br />
** [http://spencer.blackmarket.net/dic_word_search.asp Spencer's server]<br />
** [http://www.ss.ics.tut.ac.jp/pubdict/pubdict.html Heartful Dictionary] at the Toyohashi University of Technology <br />
** [http://japanod.com/ JapanOD] server, which has keitai options<br />
<br />
* PACKAGES<br />
<br />
** Rikai-derived [http://rikaixul.mozdev.org/ Mozilla plugin]<br />
** Similar [http://www.polarcloud.com/rikaichan/ Rikaichan] plugin for Firefox/Thunderbird<br />
** Popular [http://www.coolest.com/jquicktrans/ JQuickTrans] (Windows)<br />
** [http://www.physics.ucla.edu/~grosenth/jwpce.html JWPce] (free Windows WP with integrated dictionary)<br />
** [http://www.csse.monash.edu.au/~jwb/xjdic/ xjdic] - a clunky X11 terminal window program for Linux/Unix<br />
** [http://gjiten.sourceforge.net/ GJiten] - a rather cooler GUI-based Linux/Unix program<br />
** [http://www.boingo.org/dan/software/MacJDic.html MacJDic] for Macs, of course<br />
** [http://www.enfour.co.jp/unidict/e/about.html UniDict], also for Macs<br />
** [http://www.logodesignmaestro.co.uk Logo design]<br />
** [http://www.jedict.com/ JEDict] for Macs<br />
** [http://homepage.mac.com/andrewlindesay/le/page_wordlookup.html WordLookup], for Macs<br />
** [http://www.tensaimac.com/about/ Tensai], for Macs</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=927Main Page2021-08-05T05:51:38Z<p>JimBreen: /* Create an Account */</p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==User Accounts==<br />
<br />
Sorry but we no longer provide user accounts. We've been hit by link spammers which led to disabling of self-creation of accounts, and it's all too much a distraction.<br />
<br />
If you have any edits you would like to suggest, email Jim Breen (jimbreen-at-gmail.com) with the details.<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.)<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software which provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=926Editorial policy2021-07-29T07:26:35Z<p>JimBreen: /* Hyphens and Similar Characters */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===<br />
In the Unicode character set, there are nine characters that represent some sort of mid-line bar. Only three of these are to be used in the JMdict database. They are:<br />
* the "normal" ASCII hyphen "-" (hex 2d, Unicode U+2d). This is to be used in the Meanings field for all situations. It is also to be used in the Kanji field in situations such as "CD-ROM".<br />
* the long vowel symbol "ー" which is used mainly with katakana and occasionally with hiragana (JIS hex 213c Unicode U+30fc). This is only to be used in the appropriate Japanese text contexts such as "ローマ字".<br />
* the minus sign "−" (JIS hex 215d Unicode U+2212). The minus character is only to be used in algebraic or arithmetic contexts.<br />
<br />
For details of the other characters see the JMdict [https://github.com/JMdictProject/JMdictIssues/issues/34 issue] on the topic.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=925Editorial policy2021-07-29T07:09:54Z<p>JimBreen: /* Other Issues/Policies */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.<br />
===Hyphens and Similar Characters===</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Main_Page&diff=924Main Page2021-07-27T22:14:07Z<p>JimBreen: </p>
<hr />
<div>==Electronic Dictionary Research and Development Group==<br />
<br />
Welcome to the Wiki of the [[About EDRDG | Electronic Dictionary Research and Development Group]]. The Wiki has been developed as a repository of information and documentation about the Group's work and projects.<br />
<br />
==Create an Account==<br />
<br />
People wishing to participate in this Wiki are welcome to have accounts. To get an account, email a request to either William Maton (wfms-at-acm.org) or Jim Breen (jimbreen-at-gmail.com). In your email say what login ID you'd like. You'll be mailed back a temporary password to enable your account.<br />
<br />
(Sorry for the hassle, but we've been hit by link spammers and we've disabled self-creation of accounts to stop them.)<br />
<br />
==The JMdict/EDICT Project==<br />
<br />
This project is to build and maintain a freely-usable general Japanese electronic dictionary database. <br />
<br />
===History===<br />
<br />
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new editions of the files were generated daily.<br />
<br />
In July 2010 maintenance of the JMdict data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared. In September 2014 the maintenance of the [[http://www.edrdg.org/wiki/index.php/Main_Page#The_ENAMDICT.2FJMnedict_Project JMnedict]] named-entity data was moved to that database too.<br />
<br />
===Documentation and Links===<br />
<br />
Some useful links are:<br />
<br />
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]<br />
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.<br />
*the [[Editorial Process]] for handling proposed new entries and amendments<br />
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files<br />
*the [[Editorial Board]] for JMdict/EDICT<br />
*the [https://github.com/JMdictProject/JMdictIssues/issues JMdict Issues] forum where matters such as structure, format, policies, tags, and other issues concerning dictionary content can be raised and discussed (currently hosted on GitHub.)<br />
*the [https://gitlab.com/yamagoya/jmdictdb/-/issues JMdictDB Issues] site for reporting problems and making feature requests concerning the JMdictDB web pages and software.<br />
*the [https://groups.google.com/search/groups?q=edict-jmdict mailing list] for project discussion. (That page should have a link for asking to join, Alternatively email [mailto:jimbreen@gmail.com Jim Breen] and ask to be added.)<br />
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.<br />
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files<br />
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries. (Note that this is rather inactive and needs cleaning up.)<br />
<br />
== Current Version &amp; Downloads==<br />
<br />
The project's master database is continuously being updated and new versions of the files are generated daily. The date of generation is included in the header of the files.<br />
<br />
The files are currently distributed via the EDRDG [http://ftp.edrdg.org/pub/Nihongo/00INDEX.html ftp server], (formerly at Monash University) which also provides an rsync service. The main files available are:<br />
<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict.gz JMdict.gz ] - the full JMdict file, including English, German, French, Russian, Spanish, Hungarian, Slovenian and Dutch glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz JMdict_e.gz ] - the JMdict file with only English glosses;<br />
* [http://ftp.edrdg.org/pub/Nihongo/JMdict_e_examp.gz JMdict_e_examp.gz ] - the above JMdict file with example sentence pairs from the [[Tanaka_Corpus]];<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict.gz edict.gz ] - the "traditional" EDICT file.<br />
* [http://ftp.edrdg.org/pub/Nihongo/edict2.gz edict2.gz ] - the extended EDICT2 file.<br />
<br />
==JMdictDB Database==<br />
The maintenance of the JMdict/EDICT and JMnedict/ENAMDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw, and operational since June 2010. For more information see:<br />
* an [[JMdictDB Project|overview]] of the database;<br />
* Stuart's [http://edrdg.org/~smg/ summary page];<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelpq.py quick overview] to editing entries;<br />
* the [http://edrdg.org/jmdictdb/cgi-bin/edhelp.py full help file] for editing entries.<br />
* a [http://www.edrdg.org/jmdictdb/JMdictEntries.html page] showing the current entry counts for the two dictionaries (updated daily).<br />
* project [https://gitlab.com/yamagoya/jmdictdb code] at GitLab.<br />
<br />
==The Tanaka Corpus==<br />
This project is to maintain and extend the [[Tanaka Corpus]] which is a large collection of parallel Japanese/English sentence pairs.<br />
<br />
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.<br />
<br />
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.<br />
<br />
==The KANJIDIC Project==<br />
<br />
The [[KANJIDIC Project]] has compiled files of comprehensive information on kanji used in Japanese text processing. The files<br />
cover the kanji in three Japanese standards:<br />
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji.<br />
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji<br />
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji.<br />
<br />
==The COMPDIC Project==<br />
<br />
The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the [http://www.edrdg.org/jmdict/compdic_doc.html brief documentation].<br />
<br />
In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.<br />
<br />
==The ENAMDICT/JMnedict Project==<br />
<br />
The JMnedict/ENAMDICT files contain about 740,000 proper names in Japanese, covering place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. There is a basic [http://www.edrdg.org/enamdict/enamdict_doc.html documentation page].<br />
<br />
* JMnedict (the Japanese-Multilingual named entity dictionary) is in XML format and is in Unicode/UTF-8 coding. [http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz (download)]<br />
<br />
* ENAMDICT is in a variant of the EDICT format, with part-of-speech and other tags omitted and replaced with some special tags to indicate the type of proper name. [http://ftp.edrdg.org/pub/Nihongo/enamdict.gz (download)]<br />
<br />
The information in the files is held in the same database as the JMdict/EDICT information. To use the online edit system<br />
follow [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= this link] and select "jmnedict" from the drop-down Corpus menu.<br />
<br />
==The KRADFILE/RADKFILE Project==<br />
<br />
This project provides a decomposition of kanji into a number of visual elements or radicals to support software which provides a lookup service using kanji components. These elements can be seen in the [http://nihongo.monash.edu/cgi-bin/wwwjdic?1R WWWJDIC] server, the [http://jisho.org/#radical Jisho.org] server, and [http://kanji.sljfaq.org/mr.html Ben Bullock's SLJFAQ] page.<br />
<br />
There is an [http://www.edrdg.org/krad/kradinf.html information page] about the data files.<br />
<br />
==The WWWJDIC Dictionary Server==<br />
<br />
WWWJDIC is a dictionary WWW server first developed by Jim Breen in 1998. Its (rather clunky) name came about because it is based on code and techniques developed in the earlier JDIC (DOS) and XJDIC (Unix/X11) applications.<br />
<br />
The home site of the server is [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C here], and there are several [http://www.edrdg.org/wwwjdic/wwwjdicmirrors.html mirror sites] which are updated daily from the home site. The server has links at the dictionary entry level to other sites and to the JMdict database for editing entries.<br />
<br />
The main documentation is the WWWJDIC [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide].<br />
<br />
A number of elements in the server's display can be configured by users, and the interface language can be set to Japanese (as part of the [[WWWJDIC in Japanese]] project.)<br />
<br />
==Wishlist==<br />
<br />
This is a set of [[wishlist]] items for the various projects. Feel free to add suggestions.<br />
<br />
There is also an old [http://nihongo.monash.edu/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.<br />
<br />
==Mailing List==<br />
<br />
There is a [https://groups.google.com/g/edict-jmdict/ mailing list] for people engaged in the EDRDG projects.<br />
<br />
==How Can I Help?==<br />
<br />
From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:<br />
<br />
* adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the [http://www.edrdg.org/jmdictdb/cgi-bin/srchform.py?svc=jmdict&sid= Search] and [http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid= New Entry] pages of the JMdictDB system.<br />
<br />
* adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. This is done by adding them to the [https://tatoeba.org/eng Tatoeba Project] as a linked sentence pair, the contact Jim Breen to have them indexed.<br />
<br />
* assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the [[WWWJDIC in Japanese]] page.<br />
<br />
* work through the lists of words Paul Blay has place on the [[Talk:Tanaka_Corpus]] page, which could become new dictionary entries.<br />
<br />
* join and participate in the [https://groups.google.com/g/edict-jmdict mailing list] for people engaged in the EDRDG projects.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=923Editorial policy2021-06-23T05:02:35Z<p>JimBreen: /* Is it worth including? */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in one or more 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.</div>JimBreenhttps://www.edrdg.org/wiki/index.php?title=Editorial_policy&diff=922Editorial policy2021-06-23T05:01:17Z<p>JimBreen: /* Is it worth including? */</p>
<hr />
<div>==JMdict/EDICT Editorial Policy and Guidelines==<br />
<br />
These guidelines are intended for people preparing new entries or amendments for the JMdict/EDICT files. Typically these entries or amendments will be made via the [[JMdictDB_Project|JMdictDB]] on-line database system.<br />
<br />
==Before Starting==<br />
<br />
Before proposing a new entry or an amendment, you should:<br />
*familiarize yourself with the style of the dictionary, particularly the way the English meanings are typically worded;<br />
*make very sure it is not already an entry. An amazing number of "new" entries turn out to be in the dictionary already, or variants of existing entries. If it '''is''' a variant, add it to the existing entry. Check such things as:<br />
**common variants of writing 外来語, e.g. using either ー or イ for extending vowels, having a ー at the end (コンピューター/コンピュータ), etc.;<br />
**common okurigana variants, e.g. 生花/生け花;<br />
**modern and old kanji, e.g. 合気道/合氣道<br />
*check you have written it correctly. Has it the correct kanji? Is the reading correct, with the vowel length right, ず/づ issues resolved, etc.?<br />
*verify the source. There are excellent online dictionaries available, e.g. the Sanseido dictionaries at the [http://dictionary.goo.ne.jp/ Goo] site, and the various collections at the [https://kotobank.jp/ Kotobank] site. The Eijiro dictionary at the [http://www.alc.co.jp/ ALC] site is also useful. If the word or phrase can't be found in a dictionary, WWW references to where it is used may suffice, but the meaning and context has to be clear. Dictionary and other reference information '''must''' be included in the "Reference" section in the form. Include the precise URL - just "weblio" or "wiki" is no use at all to the editors. Note that if the page you are referencing is not <b>about</b> the proposed entry, include an extract from the reference text to help the editor(s) establish the validity of the proposed entry.<br />
*verify that the word or phrase is common enough to include in the dictionary. Page counts for Google or Yahoo are useful for this purpose. In general unless a word or phrase has more than about 50 hits on the WWW, it is not worth submitting.<br />
*decide whether it is really worth having as an entry. Some expressions are so obvious that it just clutters the dictionary to include them. (See the section below.)<br />
<br />
Note that it is not necessary to have an account and log in before proposing a new entry or a change to an existing entry. The login is really only for members of the Editorial Board. Changes can be proposed anonymously, but as explained below, we prefer if people identify themselves by name or nickname.<br />
<br />
==Dictionary Entry Fields==<br />
<br />
<br />
===Kanji/Special-Character Forms===<br />
<br />
The Kanji section of the entry form contains the form of the Japanese word/phrase which contains kanji, special characters or letters from non-Japanese scripts (e.g. MP3プレーヤー). The word/phrase should written in full-width characters (e.g. it is not MP3プレーヤー).<br />
<br />
There may be more than one version of the word or phrase in this section. The usual reasons for having more than one version (also known as "surface forms" or "orthographical variants") are:<br />
*alternative kanji in the word, e.g. 合気道 and 合氣道<br />
*variations in ''okurigana'', e.g., 生け花 and 生花<br />
*part of a word being written either in kanji or kana, e.g., 言い付ける and 言いつける<br />
Where there are multiple forms of a word, enter them with the most commonly used form first, and then order them in decreasing frequency of use. In general irregular or incorrect forms, e.g. those tagged iK, io or ik, should be placed to the rear of the surface form list, even if they are commonly used on WWW pages.<br />
<br />
Synonyms should not be included here. Instead they should be entered as separate dictionary entries, and a cross-reference inserted to them.<br />
<br />
Some other points to note:<br />
*in the case of na-adjectives (形容動詞), the な is NOT included in the entry (some Japanese dictionaries include it.) Use a part-of-speech of "adj-na".<br />
*as most adverbs are derived from either regular adjectives (く form) or na-adjectives (に), there is no need to have an entry unless the adverb is not apparent from the adjective.<br />
*for verbs formed from adding する to a noun, do not include the する in the headword - instead use the part-of-speech of "vs". The exception to this is the group of single-kanji-plus-する verbs such as 愛する. For these include the complete verb and use the "vs-s" part-of-speech.<br />
*for adverbs that are indicated by と, e.g. まざまざと, do not include the と, instead note the part-of-speech as "adv-to".<br />
*for adjectives that use たる (and と in the adverbial form), e.g. 依然たる, 依然と, omit the たる and と and use "adj-t" as the part-of-speech.<br />
*for the -さ (-ness) and -く (adverb) inflections of adjective, only include them if the meaning is not obvious from the gloss of the adjective itself.<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_kinf tags], e.g. iK or oK, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Readings===<br />
<br />
In this section enter either:<br />
* the reading(s) of the word/phrase in the Kanji section, or<br />
* the word itself if it is written only in kana, such as a 外来語 or a word/phrase written only in hiragana.<br />
<br />
Readings associated with kanji should normally be in hiragana; the main exceptions being:<br />
* Chinese or Korean words and names, which are often transliterated using katakana;<br />
* the names of biological species which should be entered in both katakana and hiragana (if there is also a kanji form.)<br />
* older loanwords such as 硝子 (ガラス: glass) and 加里 (カリ: potassium). Included in this are some country names such as 加奈陀 (カナダ), 英吉利 (イギリス) and 亜米利加 (アメリカ).<br />
More than one reading can be entered where alternatives are possible. This can occur when<br />
* a kanji has alternative readings;<br />
* where there are different transliterations of 外来語, e.g., ダイヤモンド and ダイアモンド;<br />
* where a species name is being recorded; in these cases both hiragana and katakana forms should be entered. The katakana form must have "[nokanji]" after it to indicate that it is used without the kanji form, and a "[uk]" should be included in the Meanings field. Place the hiragana form first (client software such as WWWJDIC will display the katakana form first.)<br />
* where a katakana form is commonly used and is identical to one of the readings (e.g. 仏陀-ぶっだ-ブッダ). Here also place "[nokanji]" after the katakana version and place it at the end of the readings.<br />
Where alternative readings are restricted to particular variants of the kanji form, specify this using the [restr=KKK] pattern after the reading.<br />
As in the Kanji section, place the more common reading(s) first.<br />
<br />
外来語 (in katakana) are entered in this section. '''Do not''' enter them in the kanji section. Where a 外来語 is a transliteration of several source words, include versions with and without a separating "middle dot", e.g. "アームレスチェア;アームレス・チェア". Note that the JIS middle-dot must be used - there are other Unicode middle-dots which are not accepted.<br />
<br />
If a 外来語 (e.g. ベースボール) means the same as a native Japanese word (e.g. 野球), do not include the 外来語 form as a reading of the kanji. Instead create a separate entry and create cross-references between them. Similarly if two kana-only words have the same meaning, do not place them in the same entry unless they are related, e.g. spelling or pronunciation variants.<br />
<br />
If the kanji part contains katakana (e.g. 一眼レフ), use katakana in the Reading as well for the matching portion (いちがんレフ).<br />
<br />
A set of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_rinf tags], e.g. ik or ok, can be applied to the words in this section. These should be used sparingly.<br />
<br />
===Meanings===<br />
<br />
The Meanings section of the entry form is divided into senses, i.e. distinct meanings. These are indicated by a sense number: [1], [2], etc. Each sense can have a number of [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_pos part of speech tags] (POS), e.g. [n], [adj-i] and [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#kw_misc miscellaneous tags], e.g. [abbr] and [col]. <br />
<br />
The meanings consist of one or more short translations or explanations of the Japanese word or phrase.<br />
====General====<br />
*do not copy translations, especially longer ones, directly from other dictionaries. For simple terms there may not be much in the way of alternatives, but for longer explanations use you own words, reword things, etc. Significant copying carries a risk of charges of plagiarism or copyright violation.<br />
*where the Japanese has more than one distinct meaning, break the section into senses.<br />
*make each translation a separate item, i.e. place a ";" between them. This makes reverse look-up and exact match on the English possible. Some examples:<br />
**abbreviations: "three letter acronym; TLA" not "three letter acronym (TLA)"<br />
**conjunctions: "rice field; rice paddy" not "rice field or paddy"<br />
*where different forms of English use different terms, include all major variants (e.g. both "snow pea" and "mange tout" or "tap" and "faucet".) <br />
*do not use capital letters unless referring to a proper name (person, place, etc.) Japanese theatrical forms should be given as "noh" and "kabuki"; not "Noh", "Kabuki", etc.<br />
*do not precede the meaning with the articles "a", "an" or "the" unless it is absolutely necessary to make the meaning clear.<br />
*when putting numbers into translations be consistent and concise. In general:<br />
** if the numbers are in the context of a formula, quantity, measurement, etc. use figures (e.g. 1.5 kilograms);<br />
** if the numbers are in something more descriptive or narrative, in general use words for numbers up to ten (e.g. three kings, five flowers), and figures for numbers over ten (e.g. 147 angels). In some cases, such as the 三十三所 entry, "thirty-three temples" looks more natural than "33 temples".<br />
** avoid mixing figures and words, even if it means relaxing the advice above. Writing "eat five to twenty raisins" or "eat 5 to 20 raisins" is fine, but "eat five to 20 raisins" looks unnatural.<br />
*make the translations as international as possible. For example, use "university" rather than "college" when referring to tertiary education, as outside the US the word "college" has much wider usage.<br />
*include both "British" and "American" spellings. For short meanings it is better to repeat the meaning with the alternative spelling, however it is also acceptable to just put the alternative at the end in parentheses, e.g. "full colour (color)". Do not use patterns such as "colo(u)r" as they can't be searched for successfully.<br />
*when using "e.g." to expand on the meaning of a word by giving examples, or when using "i.e." to qualify the meaning of a word, place the expansion in parentheses after the initial translation. For example say "hand game (e.g. rock, paper, scissors)", not "hand game, e.g. rock, paper, scissors". Also, do not include a comma after e.g. or i.e.<br />
* best not to use "etc" with a one-item list. It such cases, "e.g." is preferable.<br />
*as with the use of "e.g." and "i.e." above, it is OK to add a few words of extra information in parentheses after the translation. The situations where this is done include:<br />
**providing some context for the term;<br />
**short disambiguations;<br />
**short explanations for technical words or words where the meaning might not be clear to a literate user;<br />
**scientific names of species (but only when it is following a common name). This is explained more fully below.<br />
*provide useful explanations where appropriate. "type of card game" is not very useful - in such a case explain briefly what the card game entails<br />
* '''never''' create an English meaning purely based on the translation of the meanings of the kanji making up a word. Sometimes it will be correct, but there are many cases where the result would be quite wrong. (魂柱 does not mean "spirit pillar").<br />
*when entering the scientific name of a plant, animal, etc. put it in brackets after the first common English name, e.g. "spectacled bear (Tremarctos ornatus)". Note that the first word of the scientific name will have a capital letter. (See the note on "Names of biological species" below.)<br />
*put any context in brackets, e.g.: "consulting (the oracle)" not "consulting the oracle".<br />
*when indicating a field or domain for an entry, e.g., "comp" or "ling", state it using the [fld=xxxx] pattern. The full list of field tags is [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_fld here]. For example:<br />
** [fld=comp] floating-point<br />
*when entering the name of a species of animal, plant, etc. do not use the "zool", "bot" field tags, as this should be obvious. Those tags are really to establish the context of a technical term.<br />
*short explanatory notes can be included as part of a sense. Use the pattern [note="this is a note"]. These should be kept short, and only used when it it is necessary to include some information that can't go in a gloss. In general it is best to word the glosses so that further explanation is not needed. (These explanatory notes are not carried through to the legacy EDICT format of the dictionary, so it is permissible to have Japanese text in them.)<br />
*where the English meaning is an obscure technical term, add a short explanation in lay terms after it in parentheses. Do not add such explanations where the English meaning should be clear to a literate user (this is not an English dictionary.)<br />
*it is sometimes useful to indicate the literal meaning of an idiomatic expression, etc. In this case:<br />
** place "[lit]" at the front of the gloss;<br />
** place this gloss last, after the real translation(s).<br />
** note that the "[lit]" tag should not be used for such things as literal translations of the kanji in a jukugo.<br />
*if a gloss has a figurative meaning, this can be indicated by placing "[fig]" in front of it.<br />
*on occasions the usual translation may be a bit opaque, and a more complete explanation would be helpful. In this case add a more explanatory gloss with "[expl]" in front of it. Keep these to a minimum (it's a dictionary; not an encyclopedia.) The "[expl]" tag is usually not used if it is the only gloss in the sense. <br />
(At present the "expl", "lit" and "fig" tags are only used in the database - they are not yet exported to JMdict or EDICT.)<br />
<br />
====Which Reference Is Best?====<br />
<br />
On occasions references (see the list of dictionaries, etc. below) will differ as the the meanings of entries, and which senses are more important than others. Here are some suggestions for handling this:<br />
*our goal is to reflect '''modern Japanese''', so precedence should be given to sources that indicate up-to-date usage;<br />
*the major Japanese-English dictionaries tend to be more up-to-date and focussed in their translations than the 国語辞典;<br />
*広辞苑 lists its meanings in historical order, so use its material with caution;<br />
*in general 大辞林's meanings (especially in recent editions) appear to be more topical than those in 大辞泉 or 日国;<br />
*if a term only appears in 大辞泉 or 日国, consider tagging it "obsc" or "arch" as appropriate, unless it gets a reasonable n-gram count or number of WWW hits;<br />
*WWW pages can give confirmation of modern contexts, although quite a few pages may have to be scanned. Sometimes looking at the associated images can give a quick indication of which sense is dominant<br />
<br />
====Part-Of-Speech (POS) Issues====<br />
*where a term can be used in multiple roles, e.g. as a noun, adjective, adverb, etc., the part-of-speech tags should usually be ordered with the most common role(s) first.<br />
**many nouns in Japanese can also be, or act as adjectives (e.g. tagged as adj-na, adj-no, or adj-f in JMdict.) These terms should generally have "n" as the first part-of-speech tag and be given a noun meaning. Exceptions can be made when the adjective usage is obviously much more common*, e.g. with 複雑, or when it's difficult to translate the term as a noun in English, e.g. スポーツ万能. The approach taken by major Japanese-English dictionaries can be a guide, as can the [http://nlp.cis.unimelb.edu.au/jwb/ngrams/ngramlookup.cgi?sent=%E8%A4%87%E9%9B%91%E3%81%95%E3%82%92+++%E8%A4%87%E9%9B%91%E3%82%92+ n-gram frequency counts].<br />
*in general the form of the meaning should agree with the '''first''' part-of-speech tag for the sense. If the Japanese word is marked as a noun, don't make the translation a verb (e.g. to xxxx) or an adjective.<br />
*do not list verb translations for nouns that can also be used as verbs (i.e. [n,vs]). See the 料理 entry, which has: "cooking; cookery; cuisine", not "cook".<br />
** if the verb sense is not easily derived from the noun form, include a second sense with a POS of "vs" in which meaning will be "to ...".<br />
*if the POS of an entry is "vs" alone, the meaning will be given as a verb (such entries are rare.)<br />
*when entering a verb, use the infinitive in English (to run, to jump, etc.)<br />
*for adjectives, the English entry should be just the adjective, not the adjective and copula:<br />
** "lucky" not "be lucky" or "is lucky"<br />
*for entries marked "adj-no" or "adj-na", do not include "adj-f" as well, as the dropping of the の and な particles is quite common.<br />
*there is a range of archaic POS tags available, e.g. the ones associated with the 二段 and 四段 verb types (v2* and v4*). Most modern verb equivalents have an archaic verb equivalent, i.e. most verbs that are "v5k" could also be marked as "v4k", and most "adj-na" entries could also be flagged as "adj-nari". For most words such extra tags are quite redundant. The old verb, etc. POS tags should '''only''' be used for archaic words which never use a modern conjugation, e.g. 崇まふ.<br />
<br />
====Word Source====<br />
<br />
If the word or term comes from another language, mark this at the beginning of the sense(s) to which it applies. The format is [lsrc=lng:], where lng is the three-letter code from the [http://xml.coverpages.org/iso639a.html ISO 639-2:1998 "Codes for the representation of names of languages" standard], e.g.:<br />
* アルバイト [1][n,vs][lsrc=ger:Arbeit] part-time job <br />
* アールデコ [1][n] [lsrc=fre:"art déco"] art deco<br />
<br />
Don't do this for (i) common Sino-Japanese vocabulary, (ii) loan-words from English where the source word is among the translations; (iii) words/terms which are translations from other languages. If the word or term in the source language is identical to the translation, don't repeat it in the [lsrc:...] field. Note that where a loan-word from English was originally from another language, e.g. ベランダー/verandah, the usual practice is not to indicate a source language.<br />
<br />
Non-English source languages are usually indicated in the major 国語辞典 such as Daijrin and Daijisen, and also in 外来語 dictionaries such as the Gakken カタカナ 新語辞典. In cases of disagreement or doubt, e.g. where a term may have come from either English or French, omit any source language marking.<br />
<br />
Source words in languages that use a non-Latin script should be given in Latin transcription. Diacritical marks can be used. For the following languages, use these transcription systems:<br />
* Chinese: Pinyin (with tonal marks)<br />
* Russian: BGN/PCGN<br />
* Korean: [https://en.wikipedia.org/wiki/Romanization_of_Korean#Systems Revised Romanization] of Korean (not Yale or McCune–Reischauer)<br />
* Sanskrit: IAST<br />
<br />
The language markings apply both to loanwords (外来語), as with the examples above, and to transliterations (音写), typically the Buddhist terms taken from Sanskrit, which are not usually regarded as loanwords.<br />
<br />
Note that where ISO 639 discriminates between historical forms of a language, e.g. "grc" for Classical Greek and "gre" for Modern Greek, the modern tag is to be used as the discrimination cannot easily be applied at the word level.<br />
<br />
====Cross-References====<br />
Cross-references can be made to other dictionary entries where this enhances the value of the entry to the typical dictionary user. Examples of such useful cross-references are:<br />
: where one entry is an abbreviation of another, e.g. 学割 and 学生割引 (see below).<br />
: where the words are commonly associated or contrasted, e.g. 先輩/後輩, 税別/税込み, etc.<br />
: where there is a derivational relationship between words that it is useful to highlight, e.g. between かっけー and 格好いい, or between オケる and 空オケ.<br />
<br />
At present two classes of cross-reference are supported: a general "see" and an "ant" for antonyms. <br />
<br />
Specify the cross-reference using the pattern [see=言葉] or [ant=何等] (see the [http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py#syn_xref detailed instructions]). Where the reference is to a particular headword/reading combination, use the format: kanji・reading, e.g., [see=金本位・かねほんい]. Where the target word has a kanji form, that form should be used. For targets that are a particular sense of the target word use the format [see=漢字[2]]<br />
<br />
Please note that the "ant" (antonym) tag should only be used for genuine opposites. Words such as "short" and "tall" are antonyms; "short person" and "tall person" are not - use the regular "[see=...]" form for these. (For more information, see the excellent [http://en.wikipedia.org/wiki/Opposite_%28semantics%29#Antonyms_.28gradable_opposites.29 Wikipedia article] on this.)<br />
<br />
Avoid adding cross-references to words which simply mean the same (or opposite), as it adds a lot of clutter to the entries without necessarily being helpful to users. There are related systems such the the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese WordNet] which specifically provide details of large numbers of synonyms. Some systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC] link to the Japanese WordNet as part of the entry display.<br />
====Abbreviations====<br />
Many Japanese terms are abbreviations of longer terms, for example 学割 is an abbreviation of 学生割引. When creating an entry for such an abbreviation:<br />
* add the tag "[abbr]" to indicate it is an abbreviation;<br />
* add a cross-reference to the full form (add an entry for the full form if necessary.) For example "[see=学生割引]".<br />
If appropriate, a cross-reference back from the full form to the abbreviation may be appropriate.<br />
====Romanized Japanese====<br />
Romanized forms of Japanese words may be used within meanings in the following situations:<br />
* words such as "karate", "samurai" or "kimono" which have become part of the English lexicon. These would typically be the first meaning or gloss of the sense;<br />
* Japanese proper nouns such as Tokyo and Meiji;<br />
* romanized forms of Japanese terms which are in reasonably common use in particular contexts, e.g. "wasei eigo". These would not usually be the first meaning or gloss for the entry, but would follow more explanatory meaning(s).<br />
The [http://www.sljfaq.org/afaq/hepburn.html Hepburn romanization] system, in particular the revised (aka modified) version, will be used. That page can be taken as a guide, with the key points being:<br />
* where appropriate long vowels will be indicated using macrons (not circumflexes). Thus the era name 養老 should be written as "Yōrō"; not "Yourou", "Yooroo" or "Yoro".<br />
* where a Japanese term or name is commonly used in English, such as "tofu" and "judo", macrons would typically not be included on the long vowels. It may be appropriate to include the version with macrons in parentheses at the end of the gloss, e.g. "somen (sōmen)". Terms that are not regularly used in English should use macrons, e.g. "man'yōgana".<br />
* where ambiguities may occur, e.g. in words such as ほんやく or しんいち, apostrophes should be used to make the underlying kana forms clear, e.g. "hon'yaku" and "shin'ichi".<br />
====Old and Rarely Used Terms====<br />
Several miscellaneous tags are available for indicating that terms are no longer in current use or are rarely used. They are:<br />
* "arch" (archaism). This is typically used to indicate that the term was primarily used during or before the Edo period.<br />
* "obs" (obsolete). This is typically used for terms that were in use in the Meiji and early Showa periods, but are no longer in general use, e.g. they have been supplanted by another term.<br />
* "obsc" (obscure). This is used to indicate that a term, although in current use, is rarely encountered. A term that is included in one or more 国語辞典 but is not in Japanese-English dictionaries and has low occurrence levels in n-gram corpora would be a candidate for this tag. It is also particularly appropriate to add it to a term if there are other more common terms with the same meaning.<br />
* "hist" (historical). This is used to indicate a current term that refers to a concept in the past, e.g. an art-form common in the 18th century.<br />
* "dated". This is used to indicate an old term that is still used but sounds old fashioned and is possibly inappropriate in modern contexts.<br />
<br />
====Numbers with Units and Symbols====<br />
In general where a number is followed by a unit or symbol, the following spacing rules should be followed:<br />
* a space should be used between numbers and associated units. Please use "100 km"; not "100km".<br />
* where a number is followed by a symbol, do not include a space. Examples of this include "15°C" and "9%". Note that 5 cents would be "5c" as the "c" is treated as a symbol.<br />
====Date and Time Formats====<br />
For the sake of consistency, the same format should be used when recording specific dates. The preferred formats are:<br />
*March 17 (where the year is not included)<br />
*March 17, 2019 (where the year is included)<br />
For the dates of individual people, e.g. in the named-entity dictionary, use the YYYY.MM.DD format for the sake of brevity, e.g. "Yukio Mishima (1925.1.14-1970.11.25)".<br />
<br />
Similarly, for specific times of the day, use the "2am" and "12:30pm" styles, both to be consistent and to use the minimum amount of space.<br />
<br />
====Capital Letters====<br />
Capital letters should generally be confined to proper nouns, e.g. specific countries, places, people, products, etc. Astronomical objects such as the Sun, Saturn, etc. will have capitals, but moonlight and sunshine will not.<br />
====Use of French, etc. Diacritics====<br />
(in progress)<br />
<br />
===References===<br />
This is where you indicate the sources for the entry or amendment. It helps establish its validity, enables editors to check out the accuracy, e.g. of the translation from a 国語辞典, and leaves a record for other people to know where the entry and translation came from.<br />
* for proposed new entries supporting reference information MUST be provided. Proposals without any such information may be summarily rejected by an editor;<br />
* for amendments to existing entries, straightforward suggestions such as spelling changes or rewording of translations need not have references, but more substantial changes must be accompanied by references and/or a case for the change in the Comments field.<br />
<br />
The best references are to other dictionaries, and the more the better. Sometimes just the name of the dictionary will do, where the proposed entry is already an entry in the reference, however if the entry in the dictionary is readily visible online it is better to include the URL. Editors and regular contributors have developed a set of abbreviations and mnemonics for some of the popular sources:<br />
*koj: [http://www.iwanami.co.jp/kojien/ Kôjien, 広辞苑] - a major medium-sized 国語辞典.<br />
*daijr: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%9E%97 Daijirin, 大辞林] - another major medium-sized 国語辞典.<br />
*daijs: [http://ja.wikipedia.org/wiki/%E5%A4%A7%E8%BE%9E%E6%B3%89 Daijisen, 大辞泉] - another major medium-sized 国語辞典.<br />
*nikk: [http://www.nikkoku.net/ Nikkoku 日国/日本国語大辞典] - a major multi-volume 国語辞典.<br />
*GG5: [http://kod.kenkyusha.co.jp/service/ Kenkyusha 新和英大辞典第5版] - major Japanese-English dictionary (translators often refer to this as the "Green Goddess", hence the "GG".)<br />
*KOD追加語彙: addenda to the GG5, available via the Kenkyusha online dictionary site<br />
*ルミナス: Luminous ルミナス和英辞典 - medium Kenkyusha JE dictionary<br />
*GJD: 日本語大事典 (The Great Japanese Dictionary) - medium-sized 国語辞典 with brief English glosses<br />
*新和英中辞典: medium Kenkyusha JE dictionary<br />
*リーダーズ+プラス: medium-sized Kenkyusha English-Japanese dictionary<br />
*新英和大辞典: large Kenkyusha English-Japanese dictionary<br />
*新英和中辞典: medium-sized Kenkyusha English-Japanese dictionary<br />
*JWN: [http://compling.hss.ntu.edu.sg/wnja/index.en.html Japanese WordNet]<br />
*LSD: [https://lsd-project.jp/cgi-bin/lsdproj/ejlookup04.pl Life Sciences Dictionary] - major biomedical terminology dictionary<br />
*カタカナ新語辞典 (Gakken): a useful dictionary of loanwords<br />
*[https://unidic.ninjal.ac.jp/ Unidic]: morpheme dictionary from the National Institute for Japanese Language and Linguistics (NINJAL)<br />
*eij or alc: [http://www.eijiro.jp/ Eijiro, 英辞郎] - large word/phrase collection, available [http://www.alc.co.jp/ online] at the ALC site. In general this resource is '''not''' suitable as the sole reference for a proposed term (see the comment below).<br />
*[http://www.practical-japanese.com/ 実用日本語表現辞典], which is often used by the Weblio aggregator. This site is useful for helping understand expressions, etc. but should not be used as a sole reference for a proposed entry.<br />
<br />
Some of the above references are available via aggregator or reference WWW sites such as Goo, Weblio, Yahoo, etc. In such cases please make sure the reference URL is to the specific term on the site, and add the name of the actual dictionary being used for the reference (大辞林, 日国, etc.)<br />
<br />
If the references include online resources such as a dictionary entry or a Wikipedia article, quote the relevant URL. Please note that a Japanese Wikipedia article by itself is not necessarily a good source for a dictionary entry. Some articles are simply translations from an English page and not evidence that a term is in use in Japanese. Sometimes an article only covers one aspect of a term's usage, and there are other senses which need to be covered. It is best to check the term in other sources and state that in the References section.<br />
<br />
If the sources for the entry are other WWW-based documents, quote the URLs of at least one (preferably several), and use the Comments field to state your case for it being included.<br />
<br />
As noted above, the Eijiro glossary should not be the sole source of references for a proposed entry, although it may be used as a supplementary reference for confirming meanings. This is because the glossary is a collection of Japanese-English pairs which have apparently been collected from translations. In a [https://www.japantimes.co.jp/life/2015/09/21/language/translation-gets-tough-bow-green-goddess/ Japan Times article] Daniel Morales described it as "a smorgasbord of ''reibun'' and definitions, some of which err on the side of slang, often delighting the expat community. For example, the entry for ''nyūbō'' (乳房, breasts) has no fewer than 51 English options, including the ever-so-mature “funbags.” And ''kyūryōbi'' (給料日, payday) lists “when the eagle flies” (an American tribute to governmental pay), among other more colorful renditions."<br />
<br />
===Comments===<br />
Use this field to enter any additional information you think will help the editors when they assess the entry or amendment. These comments are kept with the entry as a record of the discussions. The Comments field will also be used by editors when providing feedback.<br />
===Name/Email address===<br />
While is not mandatory, it is best if you include your name. Editors get to know who are regular contributors of amendments and new entries, and it is easier to establish some rapport if the contributor is identified.<br />
Also, having an email address enables editors to contact a contributor directly if there is a question they wish to raise. Note that email addresses '''cannot''' be seen by people browsing the database; they are only visible to editors who have logged into the system.<br />
<br />
==Other Issues/Policies==<br />
===Anonymous Submissions===<br />
There is no requirement for people submitting new entries or amendments to identify themselves. It is preferred, however, that people making regular contributions provide some identification, either their name or a pen-name, as it will add to the sense of community among the participants, and also enable the editors to take into account the quality of previous contribution(s) when examining a proposal.<br />
===Character Codes===<br />
Although the database supporting the dictionary uses Unicode coding and can contain any character from that set, the distributed forms of the dictionary are more constrained, in particular:<br />
* the (legacy) EDICT format can only contain characters in the JIS X 0208 set. This includes 6,356 kanji, alphanumerics and the Greek and Russian alphabets, but does not include Latin alphabet characters with diacritics, such as é and ö.<br />
* the EDICT2 format used by WWWJDIC and some other applications can contain characters from both JIS X 0208 and JIS X 0212. As well as containing an additional 5,801 kanji, JIS X 0212 adds a range of other characters including Latin alphabet characters with diacritics.<br />
<br />
The JMdict database is in Unicode and thus can contain any valid Unicode characters. <br />
<br />
Care needs to be taken with the inclusion in the database of characters outside the JIS X 0208 and JIS X 0212 codesets as this has implications for the EDICT and EDICT2 versions of the data. In particular:<br />
* any non-JIS208/212 character(s) will be removed. This means that if such characters are used, e.g. some hangul in a note, then the romanized version should be included as well.<br />
* for EDICT (but not EDICT2) alphabetics with diacritics will be replaced as appropriate, e.g. ö will be changed to oe.<br />
* if a kanji or reading part of an entry contains non-JIS characters. then the part will be removed entirely. JIS X 0212 kanji are retained in EDICT2, but in EDICT the entire kanji part is removed.<br />
<br />
Kanji which lie outside the JIS X 0208 and JIS X 0212 codesets, e.g. the additional kanji in JIS X 0213, can be included in the database and will be in the JMdict distributions, however, they will not be propagated into the EDICT/EDICT2 distributions.<br />
<br />
===Merging Entries/Two-out-of-three Rule===<br />
<br />
On occasions, two or more entries may be merged when there are grounds for assuming they are variants of each other. The basic principle that is applied is a "two-out-of-three" rule (first described in a [http://www.edrdg.org/~jwb/paperdir/jmdictart.html paper] in 2004). For the candidate entries, if at least two out of the (a) kanji-headword, (b) reading and (c) meaning fields are the same, the entries may be merged. Otherwise they must be separate entries. It is often not a simple decision, as there may be kanji-headwords which only apply to some of the readings.<br />
<br />
Where the entries have multiple kanji parts or readings, this rule really applies only to the major/common forms. Mergers should not be carried out on the basis of a rare or archaic kanji form or reading. Common sense must apply.<br />
<br />
Two entries with no kanji could be merged if they have the same meaning and the kana forms are related, e.g. are variants of each other, such as ダイアモンド and ダイヤモンド.<br />
<br />
===Is it worth including?===<br />
An important issue is whether a possible entry is worth including. This question primarily arises with expressions such as XXXのYYY/XXXがYYY/etc. or compound nouns/multi-word expressions. Clearly we want to include entries that are useful and relevant, but we don't want to clutter the dictionary with things that are obvious. It is inevitably a value judgement and often leads to some debate between editors before a proposed entry is accepted or rejected. All dictionaries have to deal with this issue. It is worth reading the Wiktionary [https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion Criteria for inclusion] as it discusses many of the issues in considerable detail.<br />
The following is a list of criteria being used by the editors to assess whether a proposed entry should be included. Generally passing one or more of these criteria is needed.<br />
* is its meaning not obvious from the component parts? Note that many words/expressions have additional senses or nuances that cannot be deduced from the constituent parts (the former entry "僧になる" was removed because it failed this test, as well as the others)<br />
* is it not what someone reasonably proficient in Japanese would come up with when trying to express the English meaning in Japanese? (For example, 未収入金 is a reasonably common Japanese compound noun meaning "accounts receivable", but it is not necessarily what would be the result of translating "accounts receivable" into Japanese from scratch.)<br />
*is it already in one or more dictionaries? (Other dictionaries have had to address this issue, and if their editors have decided it is worth including, that is a good signal. Note that inclusion in Eijiro alone is not a good indication, as its coverage is vast and rather indiscriminate.) If a term is only in the 国語辞典 and has a low n-gram count and/or few WWW hits it should be tagged as "arch" or "obsc" as appropriate to signal that it is not in common use.<br />
*does it have a reading which is not obvious from the constituent kanji? (Some expressions use unusual or irregular readings, often because they are based on archaic forms.)<br />
*is it very, very common, with squillions of hits in WWW pages, etc.? (This is a rather weak test, and is mainly used with idiomatic expressions.)<br />
<br />
===Loanword Variants===<br />
Many loanwords (外来語) in Japanese have multiple surface forms which reflect such things as alternative mappings from the source language, variant vowel lengths, etc. Examples include ダイヤモンド/ダイアモンド, コンピュータ/コンピューター and ヴァイオリン/バイオリン.<br />
In general, all variants that are regular use should be included; ranked in order of use (an n-gram corpus can be used to determine this.) Rarely-used variants can be omitted, or included with an "ik" (incorrect kana) tag.<br />
<br />
===Proverbs/Kotowaza/Aphorisms/Sayings/etc.===<br />
In general, the dictionary is not the place for recording extended text passages, but there is scope for including short, pithy passages which are recognized as useful in Japanese. Tests that will be used by editors when assessing such passages for inclusion include whether they are clearly in common use in Japanese, and/or are included in one or more of the major 国語辞典.<br />
<br />
With regard to quotations and proverbs, the following guidelines are suggested for the use of the tags:<br />
* [quote] - used for entries that are passages from some text, either originally in Japanese or a translation from another language. Typically a ([note="..."]) note would be included to indicate the source/author.<br />
* [proverb] - used for entries which consist of a proverb, maxim, aphorism, pithy saying, etc. The popular Japanese ことわざ would also be tagged with this. Note that 四字熟語 have their own [yoji] tag and do not also get marked with the [proverb] tag.<br />
<br />
Some entries consist of a term or passage based on or derived from part of a historical text. These should not be marked as [quote] unless they are an actual translation. Where appropriate a note can be included indicating the original text, e.g. "deriv. from 史記 passage".<br />
<br />
===Proper Names===<br />
In general, the JMdict/EDICT dictionary is not intended to include [https://en.wikipedia.org/wiki/Proper_noun proper names] as these are included in the companion [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html ENAMDICT/JMnedict] dictionary. It is common, however, for small numbers of high-profile proper names to be included in general dictionaries, and this is the case with JMdict. Proper names included in JMdict are primarily place names, with emphasis on the names of significant places within Japan, and on the Japanese names of countries and major cities. (The proper names in JMdict will be in ENAMDICT/JMnedict as well.)<br />
<br />
The proper names considered appropriate for inclusion are:<br />
* Japanese prefectures<br />
* major Japanese cities, in particular, the [https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities designated cities] and the capitals of prefectures<br />
* Japanese regions (近畿, 北陸, 東北, etc.)<br />
* major Japanese geographical features, e.g. 本州, 北海道, 富士山, 能登半島, 琵琶湖, etc.<br />
* the former provinces in Japan<br />
* other countries and their capital cities and other significant cities<br />
* major geographical features (continents, oceans, major seas, lakes, mountain ranges, etc.)<br />
* states and provinces of English-speaking countries and their capital cities<br />
* provinces of China, major Chinese cities, and major cities in Korea<br />
* deities and other major religious figures of Japanese religions and other significant religions, in particular, the Abrahamic faiths<br />
* significant religious texts, Japanese works of literature, and reference books such as dictionaries<br />
* a select number of extremely important historical, scientific, literary, musical, etc. figures known worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) <br />
* ministries, government departments and major organizational units, especially in Japan.<br />
* company names where they also refer to common services/platforms/products (e.g. Google, Twitter, Facebook, Netflix, LINE).<br />
<br />
The above covers most of the proper names in JMdict. Some other names have been included, e.g. major newspapers, and there is discussion as to whether that can be retained under a "grandfather" principle, or confined to ENAMDICT/JMnedict.<br />
<br />
The tags such as "place", "work", "person", etc. which are used to classify named-entities in the JMnedict database may also be used for proper names in JMdict however they should only be used when the nature of the entry is not clear from the gloss itself. For example "バルセロナ (n) Barcelona (Spain)" does not need the addition of the "place" tag.<br />
<br />
As with other transcriptions of Japanese terms, the modified Hepburn system will be used. In most cases macrons will be used for long vowels, the only exceptions being cities such as Tokyo, Osaka and Kobe which are commonly used in English without macrons.<br />
<br />
===Names of biological species===<br />
The rules we are using for biological species are:<br />
*Whenever possible, both the common name and the scientific name (using binomial nomenclature) of a species should be provided. The preferred format is: ''common_name (scientific_name)'', e.g. European magpie (''Pica pica''). If the common name is unknown, the preferred format is: ''scientific_name (description)'', e.g. ''Mola mola'' (a species of sunfish).<br />
*Common names should be written in dictionary form. This means that only proper nouns and proper adjectives should be capitalized, even for officially standardized common names. e.g. "American kestrel", not "American Kestrel".<br />
*Generic names (and names of higher taxa) are always capitalized; specific epithets are never capitalized. e.g. "''Tyrannosaurus rex''", not "''tyrannosaurus rex''" or "''Tyrannosaurus Rex''"<br />
*Where applicable, subspecific taxonomic categories should be written out fully using ICZN or ICBN rules.<br />
**For animal subspecies, this consists of merely writing the subspecific epithet. For example, the cinnamon bear, a subspecies of American black bear, should be submitted as: "cinnamon bear (''Ursus americanus cinnamomum'')"<br />
**For plant subspecies, the abbreviation "subsp." should be used before the subspecific epithet. For example, occluded blindweed, a subspecies of hedge bindweed, should be submitted as: "occluded blindweed (''Calystegia sepium'' subsp. ''erratica'')"<br />
**For varieties, the abbreviation "var." must be used.<br />
**For forms, "f." must be used.<br />
**Cultivar epithets should be capitalized and placed in single quotes. (e.g. ''Taxus baccata'' 'Variegata')<br />
*Do not submit the author name. e.g., raspberry (''Rubus idaeus''), not raspberry (''Rubus idaeus'' L.) (The "L." stands for Linnaeus.)<br />
*Whenever possible, junior synonyms should not be submitted. Submit only the single scientific name currently accepted as the senior synonym. Wikipedia and The Encyclopedia of Life are good resources for finding the most up-to-date classifications.<br />
*Submissions should include the Japanese name in kanji, hiragana, and--in the vast majority of cases--katakana. Biological names are very often written in katakana, and thus a (uk) tag is usually warranted. Nevertheless, the katakana reading should always be placed ''after'' the hiragana reading. For example, 銭形海豹 [ぜにがたあざらし,ゼニガタアザラシ] (n) (uk) harbor seal (''Phoca vitulina'')/harbour seal/common seal<br />
*where the katakana name is a transcription of an English name, e.g. ブルシャーク, also include the form with the components separated by a middle-dot, e.g. ブル・シャーク.<br />
*Names of higher taxa should include the headword written entirely in kanji, even though it may be only rarely used in practice. Reading restrictions will be used where appropriate. For example, セリ科,芹科 [セリか(セリ科),せりか(芹科)] (n) Apiaceae (parsley family of plants)/Umbelliferae<br />
*When unsure of a kanji headword, it is often easy to determine based on the English translation or the appearance of the species. For example, the white-cheeked pintail (''Anas bahamas'') is known as ホオジロオナガガモ in Japanese. This word does not appear in any Japanese dictionary, but it is rather obviously written as 頬白尾長鴨. Include a kanji headword whenever it can be determined in this manner, but never guess. <br />
Note that in Japanese a genus is always denoted by the use of 属/ぞく, as in:<br><br />
ハギ属<br><br />
ハギぞく<br><br />
(n) Lespedeza (genus comprising the bush clovers)<br />
<br />
These guidelines were developed originally by [[User:ReneMalenfant|ReneMalenfant]] 21:05, 25 August 2009 (UTC) and revised by (most recently) [[User:JimBreen|JimBreen]] ([[User talk:JimBreen|talk]]) 00:21, 28 November 2017 (UTC)<br />
<br />
===Sensitive Terms===<br />
As in any language, there are words and terms in Japanese which need to be used with care and sensitivity, as they may be blunt, cause offence in some contexts, etc. In JMdict there is a "sens" tag which may be associated with one or more senses of an entry to indicate that the term should be used with a degree of caution. Determining which terms should be regarded as sensitive is quite difficult. In general the major Japanese-English and English-Japanese dictionaries do not attempt to indicate them, probably because they are usually compiled for Japanese users who do not need to be told this. <br />
<br />
A useful reference is a list of [http://www7b.biglobe.ne.jp/~marld/allow_to_follow/marld/nhk.html problem terms] (放送問題用語) based on a 1983 publication by NHK. That list, for example, includes virtually every term which includes 盲/めくら (blindness), so for 盲窓/めくら窓, it advises that "外見だけの窓" be used instead. Some of the prohibitions seem extreme; for example, 医者 is on the list, with the advice that 医師 or お医者さん be used instead, however, foreign learners of Japanese are usually taught 医者 without any qualification. Note that the list is over 30 years old, and there are reports that it is not being followed completely now. The list is categorized according to whether terms are banned (×), have some reservations (△) or are uncertain (?), and the "×" tag is applied to 122 terms.<br />
<br />
While there can be no hard and fast rules, it is suggested that people submitting or amending entries apply the following guidelines when considering whether the entry should include a "sens" tag.<br />
* if the term is already tagged as "derog" (derogatory) or "vulg" (vulgar", there is no need for any additional "sens" tag. In fact, it is preferred that where appropriate "derog" or "vulg" tags be used;<br />
* inclusion on the NHK list referenced above, particularly if it has an "×" tag, may indicate the need for a "sens" tag, however, it needs to be assessed on a case-by-case basis. The list, for example, says that 新平民 should not be used, but since it is an archaism there is no need to state it is sensitive. The list includes 板前 (chef) and recommends 板前さん be used instead, but it is clear from word-frequencies that 板前 alone is much more widely-used;<br />
* where appropriate consider a note indicating preferred alternatives, e.g. for 医者, a note "pref. 医師, お医者さん" may be appropriate.</div>JimBreen