JMdictDB Project

From EDRDG Wiki
Jump to navigation Jump to search

JMdictDB Database Project

Overview

The JMdictDB online database has been developed by Stuart McGraw to support the maintenance of the JMdict/EDICT, JMNEdict/ENAMDICT and other dictionary files originally compiled by Jim Breen. From May 2010 the JMdict/EDICT file has been maintained using the database, with full public access enabled in July 2010.

Access to the database is in several forms:

  • the WWWJDIC servers link directly to the edit screen of the JMdictDB system when a user wishes to add a new entry or amend an existing entry.
  • other servers using the JMdict/EDICT file are encouraged to offer similar links.
  • JMdictDB system's own search/lookup screens. These can look up entries using Japanese words, English words and the entries' sequence numbers. There is a basic search screen and an advanced search screen.

Users are able to propose new entries and edit existing entries. New entries and amended entries are held as "pending" until approved by one of the editors working with the project. The user submissions can be viewed using this page.

The contents of the JMdictDB database are released daily as the current JMdict and EDICT dictionary files, and are automatically added to the WWWJDIC dictionary server.

For more information, see:

Processing Flow

User Creates/Amends an Entry

Users can enter a new entry using the entry form or edit an existing entry using the same form that has been preloaded with the entry details. (example) For existing entries, the loading of the screen could be via a WWW server such as WWWJDIC, or via the system's own search form.

On completing the entry/edit the user clicks on the Next button which leads to the confirmation page. If the entry is satisfactory, the user can click on the Submit button which will mark the as Pending and queue it for consideration by an editor. At this stage a new entry will be allocated a Sequence Number which can be used for tracking its progress.

Editor Verifies Entry

An editor can view Pending entries and either approve them, perhaps after some modification, or in rare cases reject them. The editing process will be faster if the submission is accompanied by references such as dictionary extracts, quoted text, WWW site URLs, etc. The progress of an entry can be tracked by using the links on the View Updates page.

Dictionary Distribution

Once each day the dictionary database (approved entries only) is converted to an XML file from which the distribution formats (JMdict, EDICT2, EDICT, etc.) are generated. These are placed on the EDRDG ftp server and into the EDRDG WWWJDIC server, from which the other WWWJDIC servers will progressively update their files.

Viewing Current and Previous Edits

If you wish to see what new entries or amendments are currently being processed, and recently-approved changes, there are several ways this can be done:

(a) you can get a display of all the not-yet-approved new entries and amendments. To do this:

- go to the Advanced Search Form
- check the Active, Deleted and Rejected boxes in "Status", and the Unapproved box in "Approved".
- click on the Search button

(b) you can use the Advanced Search Form to display all the approved and unapproved changes for a given day. To simplify this we have a View Updates Page where you can access this information by clicking on the day you wish to see.

(c) each day a "differences" page is created showing the old and new entries side-by-side. These are in the "EDICT2" format, but are still a handy way of seeing the additions, amendments and deletions. Go the the folder containing these files and click on the date you wish to see.

Interface from Other Systems

WWW servers and web-enabled devices using the JMdict or EDICT2 versions of the dictionary can link directly to edit screens in the JMdictDB system using the Entry Sequence Number in each entry. This is in the <ent_seq> entity in the JMdict version and in the "EntLnnnnnnn" field at the end of each EDICT2 entry. The URL to use is:

http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=nnnnnnn where nnnnnnn is the sequence number

Using that URL results in an entry edit screen being loaded with the current contents of the entry.

Complete new entries can be submitted in the EDICT or EDICT2 format. For example, to submit the entry: "何か [なにか] /(exp) something/", the URL to use is:

http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&c=1&j=何か.......

The entry must be in UTF-8 coding, and the Japanese and space characters must be "URL encoded", e.g. "%E4%BD%95%E3%81%8B%20[%E3%81%AA%E3%81%AB%E3%81%8B]%20/%28exp%29%20something/".