Main Page: Difference between revisions

From EDRDG Wiki
Jump to navigation Jump to search
(32 intermediate revisions by 4 users not shown)
Line 2: Line 2:


Welcome to the Wiki of the [[About EDRDG |  Electronic Dictionary Research and Development Group]]. The Wiki is being developed as a repository of information and documentation about the Group's projects.  
Welcome to the Wiki of the [[About EDRDG |  Electronic Dictionary Research and Development Group]]. The Wiki is being developed as a repository of information and documentation about the Group's projects.  
==Create an Account==
People wishing to participate in this Wiki are welcome to have accounts. To get an account, email a request to either William Maton (wfms-at-acm.org) or Jim Breen (jimbreen-at-gmail.com). In your email say what login ID you'd like. You'll be mailed back a temporary password to enable your account.
(Sorry for the hassle, but we've been hit by link spammers and we've disabled self-creation of accounts to stop them.)


==The JMdict/EDICT Project==
==The JMdict/EDICT Project==


This project is to build a freely-usable general Japanese dictionary file. It began in 1991 with the EDICT Japanese-English file in a simple format, and in 1999 expanded into the XML-format JMdict file. From then the file has been maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file have been generated. Public input into the project has been mainly via WWW forms incorporated in the WWWJDIC server. A new edition of the files have been generated daily.
This project is to build and maintain a freely-usable general Japanese electronic dictionary database.  
 
===History===
 
The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new edition of the files was generated daily.
 
In July 2010 maintenance of the data moved to an [[JMdictDB_Project|online database]], from which the daily distributions are prepared.
 
===Documentation and Links===
 
Some useful links are:
Some useful links are:


*the [http://www.csse.monash.edu.au/~jwb/j_jmdict.html overview documentation of the JMdict file]
*the [http://www.edrdg.org/jmdict/j_jmdict.html overview documentation of the JMdict file]
*the [http://www.csse.monash.edu.au/~jwb/edict.html overview documentation of the EDICT file]
*the [[Edict_Overview|overview documentation]] of the EDICT file]
*the main [http://www.csse.monash.edu.au/~jwb/edict_doc.html documentation of the JMdict/EDICT dictionary files]
*the main [[JMdict-EDICT_Dictionary_Project|documentation of the JMdict/EDICT dictionary files]]
*the [http://www.csse.monash.edu.au/~jwb/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.
*some help with [[JMdict:_Getting_Started|getting started]] on putting in new entries or editing existing ones.
*the [http://www.edrdg.org/edrdg/licence.html licence statement for use of the projects' files]. This licence also applies to the contents of this Wiki.
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files
*lists of [[JMdictEDICT_software|packages and servers]] using the JMdict/EDICT files
*the [[editorial policy]] and guidelines for the JMdict/EDICT files (under development)
*the [[editorial policy|Editorial Policy]] and guidelines for the JMdict/EDICT files
*the [[Editorial Board]] for JMdict/EDICT
*the [[Editorial Process]] for handling proposed new entries and amendments
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries.
*an [[Entries Under Development]] page, where people can place incomplete words and phrases for later filling out to become full entries.
==JMdictDB Database==
==JMdictDB Database==
From May 2010 the maintenance of the JMdict/EDICT dictionary files has taken place in the online JMdict Database (JMdictDB) system developed by Stuart McGraw. Full operation of this system is planned for June 2010. For more information see:
The maintenance of the JMdict/EDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw since June 2010. For more information see:
* an [[JMdictDB Project|overview]] of the database;
* an [[JMdictDB Project|overview]] of the database;
* Stuart's [http://edrdg.org/~smg/ summary page];
* Stuart's [http://edrdg.org/~smg/ summary page];
Line 27: Line 45:


The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.
The Corpus is now maintained within the [http://tatoeba.org/home Tatoeba Project]. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.
===Linking with Dictionary Systems===
An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C WWWJDIC], [http://jisho.org/ Denshi Jisho] , etc. This is achieved via a set of indices attached to each sentence pair. There is a [[Sentence-Dictionary Linking|detailed description]] of this process.


==The KANJIDIC Project==
==The KANJIDIC Project==
Line 57: Line 79:


The ENAMDICT file contains about 720,000 proper names in Japanese. It is in EDICT format, with some special tags to indicate the type of proper name. It is also available in XML format as the Japanese-Multilingual named entity dictionary (JMnedict).There is a basic [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html documentation page].
The ENAMDICT file contains about 720,000 proper names in Japanese. It is in EDICT format, with some special tags to indicate the type of proper name. It is also available in XML format as the Japanese-Multilingual named entity dictionary (JMnedict).There is a basic [http://www.csse.monash.edu.au/~jwb/enamdict_doc.html documentation page].
The file will eventually be placed online for additions/amendments. As an interim step, here is a page of names which contain [[non-JIS208 kanji]] and hence cannot be in  the current file.


==The KRADFILE/RADKFILE Project==
==The KRADFILE/RADKFILE Project==
Line 66: Line 90:
==The WWWJDIC Dictionary Server==
==The WWWJDIC Dictionary Server==


* [http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1C home page] of the server (at Monash).
* [http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1C home page] of the server


* [http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html User's Guide]
* [http://www.edrdg.org/wwwjdic/wwwjdicinf.html User's Guide]


* [[WWWJDIC in Japanese]] project
* [[WWWJDIC in Japanese]] project
Line 76: Line 100:
==Wishlist==
==Wishlist==


This is a set of wishlist items for the various projects. There is also an old [http://www.csse.monash.edu.au/~jwb/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.
This is a set of [[wishlist]]  items for the various projects. Feel free to add suggestions.
 
===JMdictDB database Project===
 
===JMdict/EDICT===
 
===WWWJDIC===
 
===JMNEdict/ENAMDICT===


===KANJIDIC===
There is also an old [http://www.csse.monash.edu.au/~jwb/edictredev/edictwishlist.html wishlist page]. Some of the items in this section have been copied from it.


==Mailing List==
==Mailing List==

Revision as of 03:09, 22 December 2013

Electronic Dictionary Research and Development Group

Welcome to the Wiki of the Electronic Dictionary Research and Development Group. The Wiki is being developed as a repository of information and documentation about the Group's projects.

Create an Account

People wishing to participate in this Wiki are welcome to have accounts. To get an account, email a request to either William Maton (wfms-at-acm.org) or Jim Breen (jimbreen-at-gmail.com). In your email say what login ID you'd like. You'll be mailed back a temporary password to enable your account.

(Sorry for the hassle, but we've been hit by link spammers and we've disabled self-creation of accounts to stop them.)

The JMdict/EDICT Project

This project is to build and maintain a freely-usable general Japanese electronic dictionary database.

History

The project began in 1991 with the EDICT Japanese-English text file in a simple format. In 1999 this was expanded into the XML-format JMdict file with a more complex format allowing for much better treatment of Japanese words and expressions. From 1999 the data was maintained by Jim Breen in a mark-up system from which the JMdict file, in both English and multiple-language editions, the EDICT file, and the extended EDICT2 file were generated. Public input into the project was mainly via WWW forms incorporated in the WWWJDIC server, and new edition of the files was generated daily.

In July 2010 maintenance of the data moved to an online database, from which the daily distributions are prepared.

Documentation and Links

Some useful links are:

JMdictDB Database

The maintenance of the JMdict/EDICT dictionary files is now handled by the online JMdict Database (JMdictDB) system developed by Stuart McGraw since June 2010. For more information see:

The Tanaka Corpus

This project is to maintain and extend the Tanaka Corpus which is a large collection of parallel Japanese/English sentence pairs.

The Corpus is now maintained within the Tatoeba Project. This project has extended the file to include many other languages, and many sentences are available in three or more languages. The project WWW site has extensive facilities for searching and editing the sentences, and has an active community of people entering and editing sentences.

Linking with Dictionary Systems

An important aspect of the Tanaka Corpus and its ongoing maintenance and expansion is its use as a source of examples in dictionary systems such as WWWJDIC, Denshi Jisho , etc. This is achieved via a set of indices attached to each sentence pair. There is a detailed description of this process.

The KANJIDIC Project

The KANJIDIC project has compiled files of data on kanji used in Japanese text processing. The files cover the kanji in three Japanese standards:

  • JIS X 0208-1998, which includes 6,355 kanji.
  • JIS X 0212-1990, which includes extra 5,801 kanji
  • JIS X 0213-2004, which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 884 extra kanji.

Three data files are distributed by this project:

The COMPDIC Project

The COMPDIC project involved the compilation of a glossary of terms used in the computing and telecommunications industries. The file was in the "EDICT" format. See the brief documentation.

In 2008 the entries in the COMPDIC file were included in the JMdict/EDICT file. While it is no longer maintained as a separate file, an extract of the entries relating to computing and telecommunications is still generated.

The ENAMDICT/JMnedict Project

The ENAMDICT file contains about 720,000 proper names in Japanese. It is in EDICT format, with some special tags to indicate the type of proper name. It is also available in XML format as the Japanese-Multilingual named entity dictionary (JMnedict).There is a basic documentation page.

The file will eventually be placed online for additions/amendments. As an interim step, here is a page of names which contain non-JIS208 kanji and hence cannot be in the current file.

The KRADFILE/RADKFILE Project

This project provides a decomposition of kanji into a number of visual elements or radicals to support software which provides a lookup service using kanji components.

There is an information page about the files.

The WWWJDIC Dictionary Server

  • Common words - the 850 common words from Ogden's list. To be used to enhance English-Japanese lookups.

Wishlist

This is a set of wishlist items for the various projects. Feel free to add suggestions.

There is also an old wishlist page. Some of the items in this section have been copied from it.

Mailing List

There is a mailing list for people engaged in the EDRDG projects.

How Can I Help?

From time to time people ask how they can best contribute to the projects. There are many ways of assisting, the main ones being:

  • adding to and enhancing the main (EDICT/JMdict) dictionary file. This is best done by using the New Entry/Amendment page of WWWJDIC.
  • adding extra Japanese-English sentence pairs to the collection based on the Tanaka Corpus. There is a New Examples function in WWWJDIC for this.
  • assisting with the translation of the WWWJDIC interface into other languages. At present the priority is to make it fully available in Japanese. See the WWWJDIC in Japanese page.
  • work through the lists of words Paul Blay has place on the Talk:Tanaka_Corpus page, which could become new dictionary entries.
  • join and participate in the mailing list for people engaged in the EDRDG projects.