WWWJDIC Japanese Dictionary Server
User Guide

Contents: Introduction  Operating Instructions  Translating Text  Dictionary Files  Multi-Radical  Links  Examples  Verb Conjugations  Submitting Amendments & New Entries  Stroke Order Diagrams  Japanese Interface  Codes  Copyright  FAQ  What's New  History  Planned Improvements  Known Bugs  Browsing in Japanese  Technical Bits  Bug Reports  Mirrors  Backdoor Entry/API  Donations  Disclaimer  Acknowledgements

Last updated: 2 May 2014

INTRODUCTION

Welcome to WWWJDIC, the dictionary server operated by the Electronic Dictionary Research and Development Group (EDRDG) and associated with the JMdict/EDICT and KANJIDIC projects.

Please note that this server is intended for people who have studied some Japanese and who can read at least kana. There is no display of romanized Japanese.

WWWJDIC operates at several mirror sites around the globe. All sites carry identical information. Check here for the location of the nearest mirror site.
[Return to the top]

OPERATING INSTRUCTIONS

These are minimal, as the operation of WWWJDIC is intended to be as intuitive and self-explanatory as possible. There is an FAQ section at the back of this page.

Romaji

Care is needed with the form of romaji used for input. WWWJDIC expects "wapuro romaji", i.e. it should be typed as though it was going into a Japanese-capable Input Method (IM or IME), e.g. with an editor or word-processor. For example:

Note that WWWJDIC can accept both Hepburn and kunrei/nihon shiki; both sin'iti and shin'ichi map to the same kana. Also, as in many IMEs, xa, xi, etc. can be used for the small kana vowels.

If you are entering KUN readings when looking up kanji, note that the fixed and inflecting portions are divided by a "." (in ASCII). Normally entering a "." in romaji will result in a JIS ".", so WWWJDIC lets you specify an ASCII "." by using a comma. Thus, use "a,u" or "ka,keru". Note this only applies to the kanji database.

For people who don't like having to click the "romanized Japanese" box on the dictionary search page, you enter romaji by prefixing the romaji with an "@" character (for hiragana), e.g. "@koujou", or a "#" character (for katakana), e.g. "#va-jon". In fact this is the only way you can input the odd katakana such as the small "ke" character or the "vu" character.

Exact Match

An option on the Word Search page is "Require exact word-match", for non-Japanese search keys. If you select this option, only a restricted number of entries will be displayed, as one of the senses in the dictionary entry must match the key exactly, however two exceptions are made:
  1. any characters in parentheses before the keyword are ignored;
  2. the characters "to " preceding the keyword are ignored (thus allowing matches on English verbs).

Searching for Japanese Words

In general Japanese (and English) words can only be searched for from the beginning of the word. The only exception is when the search key begins with a kanji. In that case the match can occur anywhere in a word, however you may restrict it to occur at the beginning of the word.

Searching for English Words

You need to know that the dictionary files are based on Japanese head-words, and selecting entries using English keys can result in misleading results. For example, looking for "book" in the full EDICT file will return potentially 350 entries. For searching the EDICT file, you may be able to get better results by setting the common word restriction via the checkbox on the initial menu. Also using the "Exact Match" option, may improve the results. Checking the example sentences (if available) will help verify if the word is suitable. At all times the user should exercise caution.

The server has a list of variant English words and spellings, and if one of these is entered, it will suggest possible alternatives. So if you put in "favourite", it will suggest also looking at "favorite", if you put in "faucet", it will suggest "tap", etc. The suggestions are clickable links, so you can easily check out the suggestion. (The word list comes from the VarCon collection.)

Note that words of only one or two letters cannot be used as keys. This is to stop the dictionary index being filled with references to "if", "it", "of", "or", etc. A number of other common words such as "the" cannot be used as keys for the same reason.

Searching for multiple words

A search can be be made using two words as the search key, e.g. "break out". In this case you will find all entries in which both words appear. The words can be a mixture of Japanese and English. For example searching for "こう high" will find entries where the reading starts with こう and where the English meaning contains "high".

Short phrases, etc. can be searched by using an underscore character between words, e.g. "break_out". In this case only entries the words appear in succession will be displayed.

Kanji Colours

In the regular dictionary display, the kanji are displayed in different colours according to their classification. The common jouyou (常用) kanji are in black, the jinmeiyou (人名用) kanji are in purple, and all others are in green. This feature can be disabled using the Customization feature, in which case all kanji will be black.

Taskbar Search Buttons

Some small Javascript programs are available which enable text to be marked and then dropped straight into various lookup functions by clicking on a Taskbar button. Buttons are available for searching for Japanese or English words, and for using the Translate Words in Text function. See the button generator page for details.

Multi-Radical Kanji Selection

The Multi-Radical Kanji Selection feature does not use the 214 classical radicals. Instead it uses a slightly different set which included more basic shapes. Note that the identification of the kanji is based on the visual appearance of the elements; not on their classical radical.

Customizing

You have the opportunity to change many of the visual aspects of WWWJDIC's input and display. There is a "customization" page which lets you change the basic colours, lines/display, etc. It also lets you change from the default EUC input and output coding to either Shift-JIS or Unicode (UTF-8). For users with modern browsers, Unicode (UTF-8) may be worth using as it avoids the use of bit-mapped images.

The customization can take place either by setting a cookie in your browser, or by setting some URL parameters. Note that the cookies only work for the server which set them.
[Return to the top]

TRANSLATING TEXT

One of the options of WWWJDIC is to translate the words in Japanese text. Please note, the function does NOT attempt to translate Japanese text into English; it simply sets out to identify the words in the text and to display the translations of those words. The user is expected to know enough Japanese grammar to make sense of the results. The input text is displayed in sections, with the words detected/translated in red, or in blue where an inflected verb or adjective is assumed. If a user requests that a word/phrase only be translated once (see below), the text is displayed in brown for subsequent occurrences.

You can use this option in two ways:

  1. cut-and-paste text from another application into the text box on the browser screen. (It usually seems to go automatically into the EUC required by WWWJDIC, but if you are having problems, try the option of forcing the server to convert it to EUC.) In some cases the cut-and-paste may break characters up, resulting in a load of mojibake. Sorry if this happens, but it's a browser problem and can't be fixed in the server.
  2. specify the URL of a WWW page, and the server will fetch that page and translate the words in it. Note that in doing so, it deletes everything between < and >, i.e. all HTML labels, etc. and as a default deletes all non-Japanese characters, so all you get is the raw Japanese. (You can override this and get it to leave the non-Japanese in if you wish.) Where non-Japanese has been deleted, a "|" is inserted. (In this option, you may wish to set a new timeout value if the fetch of the WWW page takes longer than the default 60 seconds allowed.) Please note that WWWJDIC makes no attempt to handle cookies. If you can't use this facility because the site you are viewing requires cookies enabled, you will have to use the cut-and-paste alternative.

    Something you need to watch out for are URLs which don't actually point at the text you are seeing. Examples of this include text in a Frame. You need to give WWWJDIC the actual address of the frame - you can usually find this out from the browser if you right-click on the Frame text.

The default is for the original text to be displayed one line at a time, followed by a list of translated words. For the "cut-and-paste" text, there is a "hidden translation" option, in which the word translations are embedded in the text and become visible when the mouse pointer is held over the word (this option only words with browsers supporting HTML 4.)

The server detects words in the text as follows:

  1. gairaigo in katakana are detected and looked up;
  2. jukugo beginning with kanji are detected;
  3. where a kanji is followed by two or more hiragana, an attempt is made to match the kana against known verb/adjective inflections. If this succeeds, the equivalent dictionary form of the word is sought. If this is successful, the match is displayed, and the matched text displayed in blue;
  4. single kanji which have not been detected in the above will be matched against dictionary entries (if any). (This may be turned off by the user.)
  5. sequences of four or more hiragana are matched against a small file of words and phrases typically written in kana alone. Only exact matches are reported. (This function may be expanded, but the possibility of false matches is high.)
  6. a special case is made of an o or go hiragana, or the GO kanji preceding a kanji. In this case a check is made to see if the word is present in the dictionary files with and without the prefix.
Matches against complete dictionary entries are favoured over partial matches of longer entries, and if two equivalent matches are found, the longer is returned. Matched jukugo which are followed by what appears to be a particle (i.e. "wa", "no", "ni", "na", etc.) are trimmed back to just the jukugo to avoid misreporting matches from phrases and similar long dictionary entries.

Users may request that translations only appear once for each Japanese word or phrase.

The user can invoke any dictionary file for the matching, but the combination GLOSSDIC file is the default, and is strongly recommended. (Note that using the main EDICT file in this function is not recommended, as its format is no longer fully compatible with the search system employed.) One advantage of using this combined file is that it increases the chance of getting a correct match for a word, particularly if the text contains names. Also, the component sub-files in GLOSSDIC are tagged, and the match function gives preference to entries in the following order (tags shown "EP", etc.):

The reason the EDICT subset is used is so that the appropriate match is made when there are several readings of a jukugo, for example the "adult" compound will be matched against the word "otona" instead of the less common "dainin".

The full details of all the dictionary files are provided below.

Further Comments on WWW Page Translation

Please note that if you are wanting to examine Japanese text within a frame, you may have to examine the source file (e.g. View/Source) to get the address of the actual file containing the text. An alternative is to open the frame in a window of its own.

Please appreciate that the function is somewhat crude and simplistic. It can occasionally mis-parse long strings of kanji, so users are advised to examine the results carefully, especially where the text only partially matches the dictionary entry. There is a small [Partial Match] when this occurs.

A large amount of text will result in hundreds of dictionary searches, so the server may take a while to respond.

There is a front page for this function which uses frames so you can have the viewed page and WWWJDIC side-by-side.


[Return to the top]

DICTIONARY FILES

The dictionary files used by the server are:

Character Display

Some of the dictionary files contain characters used in languages such as French, German, Russian, Sanskrit, etc., which are not available in the common JIS X 0208 character set. These characters are coded in the extension set - JIS X 0212 - however most browsers cannot display these characters correctly in the default EUC-JP coding, and they are not available at all in Shift-JIS coding. For this reason

Please note that the dictionary material is for the most part copyright. Publication of material from WWWJDIC is permitted, provided appropriate acknowledgements are made. See the Copyright section below for more information on this.
[Return to the top]

MULTI-RADICAL KANJI SELECTION

The Multi-Radical Kanji Selection enables you to search for a kanji using the component "shapes" within the kanji. Each of the 12,356 kanji in the JIS X 0208 and JIS X 0212 standards has been analyzed and their components classified according to a set of 250 basic shapes. These shapes correspond approximately to the 214 "KiangXi" or classical radicals used by many kanji dictionaries, however a number of other common shapes such as 厶 and ユ are also used.

You may need to experiment with this function to get used to identifying the components of a kanji. Note that some components are further subdivided, e.g. the kanji 話 is classified by the shapes: 口, 舌 and 言.

This function uses the "radkfile" file, which contains the radical-element breakdown for the JIS kanji. The JIS X 0208 file was originally prepared by Michael Raine and revised and extended by Jim Breen, and the JIS X 0212 file was prepared by Jim Rose. These files are used to drive the multi-radical kanji-selection feature. (If you want a copy of the files, the current versions are here.) The files are inversions of the kanji-radical source files.
[Return to the top]

EXAMPLE SENTENCES

The WWWJDIC server includes a large file of Japanese/English sentences which have been linked to the EDICT dictionary file so that sentences can be displayed by clicking on the "Ex" tag after the entry. In addition, a number of sentences have been identified as suitable examples for particular entries, and are displayed whenever the entry is shown. The sentence file can be also be searched, and there is a mechanism for submitting corrections online.

The examples are mostly drawn from the Tanaka Corpus, a collection of Japanese/English sentences initially compiled by Professor Yasuhito Tanaka at Hyogo University and his students. The original sentences appear to be mostly from educational material, text books, etc. The collection was placed in the Public Domain by Professor Tanaka, and has since been placed in a Creative Commons "CC-BY" licence.

The collection is large (approximately 150,000 pairs) and is being edited as there are a number of errors and duplications in both the Japanese and English texts. A number of additional sentences have been added to provide examples of word usage.

Any suggested corrections or sentences to add to the collection are welcome, and should be submitted using the Suggestion/Comment option on the page displaying the sentences. This will link you to the Tatoeba Project, where the sentences are now maintained.

If you would like to download a complete copy of the current file of example sentences, including the index words, it is available via http or ftp. (Date of the most recent version.) A subset file which is only about 30% the size of the full file is also available.
[Return to the top]

VERB CONJUGATIONS

Most of the verbs in the main EDICT file allow an optional display of a table of verb conjugations. Where this is available, a [V] tag appears to the right of the verb display.

The table of conjugations is generated automatically according to the part-of-speech tag in the entry. It should not be assumed that for every verb, any single conjugation is as frequently used or as natural as any other.

Associated with the table of conjugations is a page of supplementary comments which attempts to expand some of the more obscure points.


[Return to the top]

STROKE ORDER DIAGRAMS

Jack Halpern's Diagrams

Associated with the most common 2,200 kanji (i.e. the Jouyou and pre-2004 Jinmeiyou kanji) are animated Stroke Order Diagrams (SODs). Where these are available, a "SOD" link will appear at the end of the information display for a kanji (example of a kanji with a SOD)

The images used in this animation are the art-work from the New Japanese-English Character Dictionary (see http://www.kanji.org/), and are used with the kind permission of Mr Jack Halpern. They were scanned and cleaned up by Jeffrey Friedl to go into Jack's Kanji Learner's Dictionary.

The Stroke Order Diagram animation was carried out as follows:

  1. the source of the diagrams is the digitized multi-panel form from the printed kanji dictionaries, in which the kanji is built up stroke by stroke. Jack Halpern provided these as BMP files.
  2. each panel of the diagram was extracted into a separate file using a combination of a special utility program and the bmptopnm and ppmtogif utilities.
  3. for each kanji, the gifsicle utility was used to make an animated GIF of the whole kanji. Some twitch a bit due the occasional alignment inaccuracies.
All this took a bit of debugging, but once it was working, it only took a few minutes to generate the diagrams for the whole 2200 kanji. All this was done on a Sun system running Solaris, so the GIF files are quite legal under the Unisys patent.

Jim Rose''s Diagrams

In addition, a further set of animated Stroke Order Diagrams are available from Jim Rose's SODER initiative at www.kanjicafe.com. (licence)
[Return to the top]

LINKS TO OTHER SYSTEMS

An interesting feature of WWWJDIC is the system of links to other servers and files. These are:

  1. to other WWW kanji/hanzi/hanja character dictionaries. These links go from the kanji information page, and enable direct access to the information about that kanji held on other databases. The databases currently linked are:

    The "unifying" code we use to implement these links is the Unicode (UCS2) code-point. We intend to have all the systems cross-linked. You can index from Chuck's and Rick's systems back to WWWJDIC.

  2. the jeKai Project. This project is developing a WWW-based dictionary of extended information about words & phrases in Japanese. WWWJDIC examines the jeKai index and when it displays a Japanese word which is in the jeKai files, it creates a link. [jeKai]
  3. the online Sanseido dictionary at Goo. The link goes from the normal word display, and triggers the JE server at that site. You can use the other dictionaries at that site, including the big Daijirin. [S]
  4. the Google search engine, which is called with the displayed Japanese word(s) as a search key. The "images" option can also be used. [G] and [GI]
  5. the Eijiro dictionary at the ALC server in Japan. [A]
  6. the Japanese Wikipedia. WWWJDIC maintains a list of all article headings in the Japanese Wikipedia, and where an article is available for a displayed dictionary entry, a link is provided. [W]
  7. the Japanese WordNet at NICT. As with the Japanese Wikipedia, WWWJDIC maintains a list of all words in the Japanese WordNet, and provides a link when a displayed entry matches. [JW]

[Return to the top]

SUBMITTING AMENDMENTS AND NEW ENTRIES

Users of WWWJDIC are welcome to submit amendments to the dictionary files, and also to submit new entries via the online dictionary database. There is an "[Edit]" link after each entry which will take you to the database edit page. For new entries use the link at the top of each page.

There is a page of basic advice about submitting an entry, and you should also read the editorial policy page on the EDRDG Wiki.
[Return to the top]

JAPANESE INTERFACE

Late in 2007 work began on modifying the server code and building parallel message tables so that users could opt for either English or Japanese as the language of the server interface. A major set of messages were translated in July/August 2008. At this stage many of the server functions were available entirely in Japanese.

The Japanese/English selection is an option on the Customize Page, and the language preference is set in the customization cookie. There is also a link on the front page which selects the language and resets the cookie.

The following people have made major contributions to the provision of Japanese messages in the server interface:


[Return to the top]

ABBREVIATIONS AND CODES USED IN DICTIONARY ENTRIES

The dictionary entries contain a number of abbreviations and codes, mainly to reduce storage usage and display space. (Full list of codes in alphabetical order.)

Part-of-Speech (POS) Codes

CODE MEANING CODE MEANING CODE MEANING CODE MEANING
adj-i adjective (keiyoushi) adj-kari `kari' adjective (archaic) adj-ku `ku' adjective (archaic) adj-f noun, verb, etc. acting prenominally (incl. rentaikei)
adj-na adjectival nouns or quasi-adjectives (keiyoudoushi) adj-nari archaic/formal form of na-adjective adj-no nouns which may take the genitive case particle "no" adj-pn pre-noun adjectival (rentaishi)
adj-shiku `shiku' adjective (archaic) adj-t `taru' adjective adv adverb (fukushi) adv-to adverb (with particle "to")
aux auxiliary aux-v auxiliary verb conj conjunction ctr counter
exp Expressions (phrases, clauses, etc.) id idiomatic expression int interjection (kandoushi) n noun (common) (futsuumeishi)
n-p proper noun n-adv adverbial noun (fukushitekimeishi) n-t noun (temporal) (jisoumeishi) pn pronoun
prt particle pref prefix suf suffix v1 Ichidan verb
v2a-s, v2k-k, etc. Nidan verb (lower/upper) with 'u', `ku', etc. endings (archaic) v4k, v4r, etc.Yodan verb with `ku', `ru', etc. endings (archaic) v5u, v5k, etc. Godan verb with `u', `ku', etc. endings v5k-s Godan verb - Iku/Yuku special class
v5aru Godan verb - -aru special class vi intransitive verb vs noun or participle which takes the aux. verb suru vs-c su verb - precursor to the modern suru
vs-i expression using the aux. verb suru(*) vs-s suru verb - special class vk Kuru verb - special class vt transitive verb
vz Ichidan verb - -zuru special class (alternative form of -jiru verbs) v-unspec verb - uspecified (usu. archaic) - - - -
(*) This tag is also used for the する entry. It is primarily used to assist the verb conjugation table function in WWWJDIC.

Miscellaneous Codes

CODE MEANING CODE MEANING CODE MEANING CODE MEANING
abbr abbreviation arch archaism ateji kanji used as phonetic symbol(s) chn children's language
col colloquialism fam familiar language fem female term or language gikun gikun (meaning) reading
hon honorific or respectful (sonkeigo) language hum humble (kenjougo) language iK word containing irregular kanji usage ik word containing irregular kana usage
io irregular okurigana usage joc jocular or humorous term male male term or language m-sl manga slang
obs obsolete term obsc obscure term oK word containing out-dated kanji ok out-dated or obsolete kana usage
on-mim onomatopoeic or mimetic word pol polite (teineigo) language sl slang sens term with some sensitivity about its usage
uK word usually written using kanji alone uk word usually written using kana alone vulg vulgar expression or word P "Priority" entry, i.e. among approx. 20,000 words deemed to be common in Japanese
X rude or X-rated term (not displayed in educational software) - - - - - -
For more information about the P (Priority) markers, see the Word Priority Marking section in the JMdict/EDICT documentation.

Domain or Field Codes

These indicate that the word or expression has particular application (but not necessarily exclusive application) in the specified domain.
CODE MEANING CODE MEANING CODE MEANING CODE MEANING
archit architecture astron astronomy, space, astronautics, etc. Buddh Buddhism baseb baseball
biol biology bot botany bus business comp computing/telecommunications
econ economics eng engineering fin finance food food
geol geology, earth sciences, geophysics, etc. geom geometry law law, legal studies, etc. ling linguistics
MA martial arts math mathematics med medicine, anatomy, pathology, etc. mil military
music music physics physics Shinto Shinto sports sports
sumo sumo zool zoology - - - -

Names Dictionary Codes

CODE MEANING CODE MEANING CODE MEANING CODE MEANING
s surname p place-name u person name, as-yet unclassified g given name, as-yet not classified by sex
f female given name m male given name h a full (family plus given) name of a historical person c company name
o organization name pr product name st station name - -

Dictionary File Codes

The THE_LOT and GLOSSDIC files have the following codes attached to each entry to show the dictionary file from which it has been selected.
CODE MEANING CODE MEANING CODE MEANING CODE MEANING
AV aviation BU buddhdic CA cardic CC concrete
CO compdic ED edict (the rest) EP edict (priority subset) ES engscidic
EV envgloss FM finmktdic FO forsdic_e GE geodic
KD small hiragana dictionary for glossing LG lingdic LS lifscidic LW1/2 lawdic1/2
MA manufdic NA enamdict PL j_places (entries not already in enamdict) PP pandpdic
RH revhenkan (kanji/kana with no English translation yet) RW riverwater SP special words & phrases ST stardict
WI1/2 wipfile (work-in-progress) - - - - - -

Regional and Dialect Codes

These tags indicate that a word or phrase is associated with a particular regional language variant within Japan.
CODE MEANING CODE MEANING CODE MEANING CODE MEANING
hob Hokkaido ksb Kansai ktb Kantou kyb Kyouto
kyu Kyushu nab Nara osb Kansai rkb Ryuukyuu
thb Touhoku tsb Tosa tsug Tsugaru - -

Kanji Dictionary Codes

WWWJDIC uses the KANJIDIC file of kanji information. It has a system of letter codes in front of the various fields, e.g. "U798f B113 G3 S13 F467 ...". These are explained in the full documentation for that file. The "See an explanation ..." link below the kanji information display will give an expanded version of the fields.

Others

In addition to the codes above, for gairaigo which have not been derived from English words, the source language has been indicated using the three-letter codes from the ISO 639 "Code for the representation of names of languages" standard, e.g. ``(fre: avec)".

In entries which are Japanese idiomatic expressions, aphorisms, etc. the literal translation of the Japanese is sometimes shown in parentheses, preceded by "lit:". Also where the Japanese word has been constructed by transliteration of two or more foreign words or word fragments (e.g., a waseieigo - Japanese-made English), the source words are indicated by "wasei:".
[Return to the top]

COPYRIGHT

CC-SA
The material being displayed in WWWJDIC's pages is copyright. Much of it is drawn from dictionary files the copyright of most of which is held by the Electronic Dictionary Research and Development Group (EDRDG). Other material is associated with the WWWJDIC server and software. It is being made available under a Creative Commons Attribution-ShareAlike Licence (V3.0) (日本語バージョン). (Note that the Japanese-Dutch file has a no-commercial-use Creative Commons licence.)

What does this mean in practical terms? Well:

  1. you can use WWWJDIC in the same way as you use a published dictionary to assist you with translating text and words. The results of your translation may be published, sold, etc. If you make heavy use of WWWJDIC it would be nice to acknowledge that, but there is no requirement to do more;
  2. you can link to WWWJDIC, e.g. using the backdoor entry, from other servers, provided you acknowledge that use on your server, and provide links to WWWJDIC and its documentation.
  3. if you wish to publish significant extracts of the output from WWWJDIC, for example if you use the Translate Words in Text function to generate a vocabulary list for a textbook of reading passages, then this comes under the scope of the licence for the dictionary files, which permits publication of subsets of the files. You must acknowledge the source of this information. Other information produced by the server, e.g. the verb conjugation tables, may be published but the source must be acknowledged.
  4. the Stroke Order Diagrams are under either Jack Halpern's or Jim Rose's copyright. You may link to the pages displaying those images, but you must not download or reuse the images without their respective permissions.
  5. the example sentences are from the Tanaka Corpus and are in the Public Domain;
For more details, see the licence statement covering the dictionary files.
[Return to the top]

FAQ (Frequently Asked Questions)

Input Display Keitai/Cellphones/Mobile phones Translate Words in Text Running WWWJDIC locally Miscellaneous
[Return to the top]

WWWJDIC HISTORY

(By Jim Breen)

No sooner had the WWW come into being that servers accessing my dictionary files began to appear. The first, which operated briefly in 1994, was a slight rework of my xjdic program by Otfried Schwarzkopf. It overtaxed his 386, and was closed down fairly quickly, however by that stage Jeffrey Friedl's famous Dictionary engine was running. There was also Rafael Santos' system, the EVA/POETS engine at Notre Dame in Tokyo, PSP's ALISE-based system, etc. etc., as well as Lambert Schomaker's WWW edition of the KANJIDIC file. Most of these have faded away now.

I had intended to have a WWW version of xjdic right from the moment I knew about the WWW, and in 1994 collected some information on writing CGI programs ready for the assault. It always seemed too big a task, and anyway Jeffrey's server was doing a good job. Eventually in mid-1997 it got too much for me, as I wanted to experiment with some features not handled by Jeffrey's server, and I also wanted to see my name in the WWW lights too, so I filleted out the search-engine parts of xjdic and dashed off a new CGI-oriented front-end. It only took a week or two of spare time and was up and running. I could easily have done it years before.

WWWJDIC has proved popular, and has probably overtaken the early lead Jeffrey's server established. It has been relatively easy to modify, so I have tinkered with it quite a bit (see below.) In fact, it is now probably the major vehicle for me trying out things to do with Japanese dictionaries.

Starting in late in 1998 I have installed a number of mirrors. The first two were quite a bit of work as I had effectively written a lot of hard-coded stuff pointing at the Monash site. The code is now fairly portable (for a Unix/Linux box running Apache.) Having a lot of mirrors brought in the problem of keeping them up-to-date. To handle this, in 2000 I set up an "rsync server" at Monash and have set "cron" scripts running at the mirror sites which periodically interrogate the Monash site and collect and install any updated files.


[Return to the top]

WHAT'S NEW


[Return to the top]

PLANNED IMPROVEMENTS


[Return to the top]

KNOWN BUGS & PROBLEMS


[Return to the top]

BROWSING IN JAPANESE

( This information is now rather historical, as most browsers and operating systems support the display of Japanese text.)

As WWWJDIC provides no support for the display of Japanese words in a romanized form (Romaji), you will require some capability for displaying Japanese kana and kanji. The best way to do this is to install the appropriate Japanese fonts and set your browser to use them. Most modern browsers support that facility. If you do not wish to do that, you may access WWWJDIC via a special server that will send out bit-mapped versions of Japanese characters (see below.)

If you are a Unix/Linux person using Mozilla, Netscape, Galeon, etc. all you have to do is make sure that a Japanese font file has been installed in the correct directory (e.g. /usr/X11R6/lib/X11/fonts/misc). Recent releases of Linux come with this included. You may have to make sure mkfontdir has been run too. You will then have to make sure that the browser knows to use this font when it encounters Japanese text. This is done (e.g. in Netscape) via the Edit/Preferences/Appearance/Fonts menu. If the WWW page is correctly marked as using Japanese, any Japanese text should appear immediately. Many WWW pages are not marked correctly, so you may have to to turn on Japanese viewing via the View/Character Set/Japanese (autodetect) menu. (Note that some Unix/Linux browsers do not allow input of Japanese via input methods such as kinput2. I use Mozilla, which does support kinput2.

For Windows users, probably the best method is to make sure a Japanese True-Type Font (ttf) has been installed on your system, and set your browser to use it. For WindowsXP, etc. this should happen when you enable Japanese support. The Monash ftp archive has two Microsoft Japanese fonts available: Gothic and Mincho. These are both self-installing executable files. Once a font has been installed, you need to tell your browser to use that font for Japanese text. In Netscape this would be done via the Edit/Preferences/Appearance/Fonts menu. As ever, you will probably need to restart Windows to make it work.

Windows users also have a more complete solution which is to install the language support Windows Update from Microsoft. It has become hard to find from that page, but fortunately it appears also to be available here. This brings in the Japanese Language Support and Japanese Input Method Editor which allow users to view and input Japanese with reasonable ease. The IME works with MS-IE and from V4.72 also works with Netscape.(Note that even if you have no intention of using IE, you may need to have it installed in order to be able to install the IME.) Later versions of Windows based on NT (2000, XP) come with fonts and an IME already.

Macintosh users have various ways of browsing in Japanese. For an excellent description, see Christopher Bolton's Japanese for Your Mac page.

If you do not want, or cannot operate a full Japanese environment for your browser, you can access WWWJDIC via another server which will insert bit-mapped graphic characters as required. One such server is available on the Monash site here.

[Return to the top]

TECHNICAL BITS

Structure

WWWJDIC is a single C program which takes its parameters from the URL (QUERY_STRING) and from the various buttons (POST method). It carries as much as it can of the user's state by loading the values of the various radio/checkboxes. View the source of some of the screens if you want to see how the CGI stuff is working. (NB: As mentioned above, it uses the POST method for receiving parameters from the browser; not the GET method. Some WWW query systems can only use the GET method, and thus will not currently work with WWWJDIC.)

No database system is used. Each dictionary file is a single text file with a dictionary entry per line. Associated with each text file is a sorted index file containing pointers to each word or token in an entry. A binary search is used to find an entry/entries which contain the desired word, making the dictionary lookup extremely fast and efficient. The examples file is handled in a similar fashion, except a quasi-dictionary is used which has pointers to the sentences which contain particular words. (This method of dictionary indexing was introduced in 1990 in the original DOS "JDIC" program, and is also used in the xjdic program.)

The program runs under the Apache server and on a number of different Unix-like operating systems, including Solaris, AIX, FreeBSD and several Linux distributions. No attempt has been made to run it under Windows.

I originally planned to have a permanent dictionary search engine, with CGI programs calling it, as happens with Jeffrey's dictionary server. In the end I did not go ahead with this, as memory-mapped handling of the read-only dictionary files, and the significant caching carried out by the file system, achieves the same efficiency goal anyway.

Japanese Character Codes

WWWJDIC uses the EUC-JP coding for all its files and all internal processing. EUC-JP is also the default coding for the HTML it generates.

The characters encoded in the files are from the JIS X 0208 character set which contains the Japanese kana and most common 6,355 kanji along with the Russian and Greek sets, plus the JIS X 0212 character set which includes a further 5,801 kanji plus some Latin characters with diacritics (acute, grave, umlaut, etc.)

When pages are displayed using the EUC-JP or Shift_JIS encodings, characters from JIS X 0212 are displayed either as HTML entities or as 16x16 bit-mapped images. If the optional UTF-8 coding is used, all characters are displayed in that coding.
[Return to the top]

BUG REPORTS

Reports of errors in the server software or configuration, or in the dictionary , etc. files are most welcome. The best ways to report these are:
  1. for errors in the dictionary files, use either the "Suggest an amendment" link below the display of entries, or the full New Entry/Amendment form
  2. for errors in the example sentences (Tanaka corpus) use the "Send Comments/Correction" option on the page of displayed example.
  3. for other errors, e.g. server malfunction, email me (Jim) at Jim.Breen@infotech.monash.edu.au.

[Return to the top]

MIRROR SITES

Mirror sites stay up-to-date by connecting to the master site at Monash once each day, retrieving a manifest file, then retrieving any updated source or data files. The file retrieval is done using the rsync system, which is excellent for retrieving small portions of large files. (There is an anonymous rsync server running at Monash for this purpose.) According to the settings in the manifest file, modified source files are compiled, index files are generated, etc. as part of this daily update.

I get a number of enquiries from people offering to host mirrors. I am not actively seeking many more mirrors, however I like to have a reasonable geographic spread. The basic requirements for a mirror site are:

  1. I must have an account on the system. Installation is complicated and not well documented.
  2. it must be a permanent arrangement, or at least one capable of being used for several years. I don't want to go to trouble setting it up only to have it withdrawn.
  3. it must be a Unix-like operating system (Solaris, Linux, AIX, etc.) It would take a major rewrite to get it to work in Microsoft's ASP, and I have no motivation to do that.
  4. it must have an Apache server running, plus a full suite of utility software, including gcc, wget, lynx, rsync, etc.
  5. it must be very well connected to the Internet. Having a poorly connected mirror is a waste of time.
  6. about 200Mb of disk space is required to hold the data and program files. The mirror will operate satisfactorily on a system with 256Mb of RAM and 512Mb of swap space, however more is better, especially if other systems are sharing the server. The CPU load is relatively small, however a faster processor will reduce the time spent indexing the dictionary files during the daily update.
Note, I don't provide mirrors for individuals. Setting up and maintaining a mirror takes quite a lot of my time.

Personal Mirrors

I get a lot of requests from people wanting to have a mirror on their own machine for local off-line use. At present I have to say "no". The code and data files are reasonably complex and quite undocumented. I simply do not have the time or energy to write installation and maintenance documentation, or to answer the inevitable questions that would arise.

Also there is the issue of quality control. I make several changes to either the code or data every week. I can't guarantee personal mirrors would stay in step with all this, and I hate getting emails about things I have already fixed.
[Return to the top]

BACKDOOR ENTRY/API

If you want interface to WWWJDIC from another page or a CGI program, there is a "backdoor" or API (application program interface) entry which enables simple searches to be initiated via the URL QUERY_STRING. To use this, you must use the URL associated with the WWWJDIC cgi program, with the "backdoor" code set. The format is: where: (Note that Japanese text has to be in the URL-escape coding with each byte as %xx.)

Examples

1MKU4ed8 - look up the kanji with the Unicode codepoint "4ed8"

1MMJ%E4%BB%98 - look up the kanji with the UTF8 code of "E4BB98".

1MMMB140=6-7= - look up the kanji with the Bushu code of "140" and with 6 and 7 strokes

4MDJkoujou - look up the Japanese word "koujou" (romanized) in dictionary 4.

1MDJ%C0%E8%C0%B8 - look up the Japanese word "sensei" (in kanji) using EUC-JP coding.

1MSJ%90%E6%90%B6 - as above, but in Shift_JIS.

1ZUJ%E5%85%88%E7%94%9F - as above, but in UTF-8 and producing a "raw output" display.

1MDErabbit - look up the word "rabbit" in EDICT

9MGG%xx%xx%xx%xx%xx%xx%xx - gloss the (EUC) text

Also, if you want to change the colour, numbers of line per page, etc. you can also add the URL customization parameters at the end of the URL string, e.g.:

1MDEhorse_2_25_5_pink - look up "horse" and return the results on a pink page in Shift_JIS with 25 lines/page.

Note that if you want to use this method with other sites, you will need to modify the URL accordingly.

The "raw" dictionary display option is intended for calls from other programs, smartphones, etc. It omits all header and footer information from the pages, and displays the unedited dictionary entries in EDICT and KANJIDIC format, one-per-line and encapsulated by <pre> ... </pre>. In this option the output is always in UTF-8 coding.
[Return to the top]

DONATIONS

Several kind people have asked how if they can make donations to the WWWJDIC project, including the EDICT, ENAMDICT, etc. dictionary files. Well yes, they can. The project is part of the Electronic Dictionary Research and Development Group, and donations help fund the ongoing development of the dictionaries and software. Also as the home site of WWWJDIC, EDICT, etc. on a commercial site (Jim has now retired from Monash), it is great to have it self-funding and not have to rely on things like advertising.

If you are inclined to make a donation it would be most welcome. There are two ways of donating:


[Return to the top]

DISCLAIMER

The WWWJDIC server uses dictionary files from a wide variety of sources. Some of these files have been compiled and edited by Jim Breen and others associated with the JMdict/EDICT project, and while every effort has been made to ensure their accuracy, there are sure to be some errors. Other files have come from external sources and are of varying qualities.

Monash University and other providers of the WWWJDIC server make NO WARRANTY as to the accuracy of the information provided by the servers and advise users that any use of the servers is ENTIRELY at their own risk.
[Return to the top]

ACKNOWLEDGEMENTS

I want to record my thanks to a few of the key people who have helped with the server. E&OE.
[Return to the top]
Go to Jim Breen's Japanese Page.