JMdictDB
This page contains infomation about the development
of a Postgresql database to support Jim Breen's Japanese-English dictionary
projects including
JMdict,
JMnedict,
Kanjidic2
WWWJDIC
and others.
Jim runs these projects under the auspices of the
Electronic Dictionary Research and Development Group
EDRDG).
The goals of this project (in priority order) are:
- To create a database to serve as a master repository for the
information in the JMdict, EDICT, JMnedict, Examples, Kanjidic
and other related files distributed by Jim Breen and the EDRDG.
- To provide a web-based system for the submission, review, and
approval of corrections and new entries to these data.
- To provide freely available software to others who want to use
or build upon, "JMdict in a database".
- To provide an open-source replacement for the principal author's
Microsoft Access based JMdict database. :-)
Discussion of this project takes place on the edict-jmdict@yahoo.com
mailing list
(http://groups.yahoo.com/group/edict-jmdict/).
Jim Breen maintains a web page describing the JMdict project's
use of JMdictDB at
http://www.edrdg.org/wiki/index.php/JMdictDB_Project.
There is also some older information at
http://www.csse.monash.edu.au/~jwb/edictredev/.
The project code is still undergoing active development but is
currently in use as the primary repository for the JMdict project
dictionary data and the web interface (which provides submission
and approval cababilities among other things) is undergoing public
testing.
All the code developed for this project is GPL'd and maintained
in a publicly accessible Mercurial repository (links below).
Additional help is welcome; please post to the edict-jmdict
mailing list, or email the current principal developer at the
address at the bottom of this page.
The code currently consists of scripts to create and load JMdict (and
related data such as the JMnedict "Japanese names" file, or the Tanaka
"examples" file) into a Postgresql database, some maintenance and other
command line tools, and a set of CGI scripts to allow access and updating
of the database using a web browser. The code was originally written in
Perl but has been migrated entirely to Python as of 2008-05-02. The code
is developed and tested under Ubuntu Linux and Fedora 11 (both with Apache
web server), and Microsoft Windows XP-pro (with IIS web server). More
information on prequisites is in the README.txt file.
News
New 2010-07-13:
In this email
on the Edict/JMdict list, Jim Breen announced that WWWjdic is now using
JMdictDB for accepting corrections and new entries from WWWjdic users worldwide.
New 2010-06-15:
In this email
on the Edict/JMdict list, Jim Breen announced that the WWWjdic testbed system
now updates the live JMdictDB database, from which the JMdict XML,
EDICT, WWWjdic and other EDRDG files are produced.
List members are asked to try making real submissions for inclusion in
WWWjdic and the EDRDG files.
New 2010-06-02:
Call for testers. In
this email
on the Edict/JMdict list, Jim Breen announced
that the WWWjdic testbed at
had been interfaced to the JMdictDB test database
and submission pages. The WWWjdic Suggest button will submit corrections
and new entries using JMdictDB. This is for testing only and changes
will not go into the real WWWjdic.
Caution, the WWWjdic testbed now updates the live JMdict database,
see the 2010-06-15 news item.
New 2010-06-01:
Call for testers. In
this email
on the Edict/JMdict list, Jim Breen requested testing of the JMdictDB database. URLs are:
Search for an entry:
http://www.edrdg.org/~jwb/cgi-bin/srchformq.py?svc=jmtest
Advanced Search (lots more options.)
http://www.edrdg.org/~jwb/cgi-bin/srchform.py?svc=jmtest
New Entry:
http://www.edrdg.org/~jwb/cgi-bin/edform.py?svc=jmtest&c=1
Quick Overview
http://www.edrdg.org/~jwb/cgi-bin/edhelpq.py
Full Help File
http://www.edrdg.org/~jwb/cgi-bin/edhelp.py
Note that this is a test database and changes made will not go into
the real WWWjdic database and may be periodically discarded.
New 2010-05-09:
The JMdictDB database and web interface are now being used by Jim
Breen as the primary repostitory for JMdict data. The distribution
JMdict XML and related files are being produced from the database
data, and new entry and amendment submissions from WWWjdic and
processed into the database. See
http://tech.groups.yahoo.com/group/edict-jmdict/message/3716
for more details.
Try it !
Access to the online test version of JMdictDB.
(Note that these links are to the web pages provided in the JMdictDB
source code. The pages linked to from WWWJDIC are very similar but
have been tweaked to the needs of WWWJDIC.)
Find and edit existing entries: search /
advanced search
Add a new entry
Editing
quick overview
or
full help
Please feel free to try these out, including adding any real or junk
entries you want, but be aware that all changes will be thrown away
periodically and will NOT go into the real JMdict.
Code and Documentation
jmdictdb
-- Browsable (read-only) access to the JMdictDB code Mercurial repository.
tip.tar.gz
-- Download source code, latest development version (gzipped tar file).
README.txt
-- The README file, includes install prerequisites and instructions (2010-03-10)
schema.html,
schema.pdf
-- Comprehensive description of the database schema (2008-11-12).
schema.png
-- Diagram of the database schema (200KB, 2008-11-12).
todo.html
-- To-do list for the project (2010-07-24).
T021.tar.gz
-- Source code for last version implemented in Perl, obsolete, 2008-05-03 (gzipped tar file).
Related files, but not part of JMdictDB...
The following HTML pages list all jmdict entries that share a common
kanji or reading text with at least one other entry. The entries
are sorted by the text making it relatively easy to identify
enties that are very similar and possibly should be merged.
This data is based on the 2007-01-14 version on JMdict.
Shared kanji (800KB)
Shared readings (10MB)
Matchup of Kale Stutzman's 2007-01-14
google hit counts and corresponding JMdict entries
(Kale's email):
README.txt (also included in the .zip files)
kale-u.zip UTF-8 encoded files
kale-w.zip SJIS (Windows) encoded files
Kale Stutzman's original data file in alternate encodings:
edict-gfreq.euc EUC-JP encoded
edict-gfreq.utf UTF-8 encoded
|
|
Please send questions or comments about these pages or the JMdictDB
project in general to:
Stuart McGraw <smcg4191x@friix.com> (remove the x's)
|