ENAMDICT/JMnedict

Japanese Proper Names Dictionary Files

Copyright (C) 2014 The Electronic Dictionary Research and Development

Introduction

The ENAMDICT/Mnedict files contain Japanese proper names; place-names, surnames, given names, (some) company names and product names. These were originally included in the EDICT file, along with other non-name entries. By late 1995, the number of name entries had exceeded the others, and the file was becoming unmanageably large, so the decision was made to split it. From this split came the ENAMDICT file.

The JMnedict (Japanese Multilingual Named Entity Dictionary) is simply the ENAMDICT file reformatted into an XML file in UTF-8 coding. It also has a small number of names which use kanji from the JIS X 0212 character set.

Format

The format of the ENAMDICT file is similar to the EDICT file, and the EDICT documention should be consulted for more information.

The names have classification codes associated with them. The codes are

s - surname (138,500)
p - place-name (99,500)
u - person name, either given or surname, as-yet unclassified (139,000) 
g - given name, as-yet not classified by sex (64,600)
f - female given name (106,300)
m - male given name (14,500)
h - full (usually family plus given) name of a particular person (30,500)
pr - product name (55)
c - company name (34)
o - organization name
st - stations (8,254)
wk - work of literature, art, film, etc.

These codes are at the front of each group of translations, e.g. "(f) Hiroko" or "(s) Tanaka".

In addition, a number of country-names are added in parentheses after place-names.

The JMnedict is structured according to its DTD, which is at the front of the file.

Updating

At present the names file is held in text form and updated by Jim Breen. If you want to suggest amendments or additions, please email them to "jimbreen (at) gmail.com". There are plans to move the maintenance of the names data into the same database system as JMdict/EDICT.

Downloads

The files can be downloaded from the Monash ftp site: enamdict.gz and JMnedict.xml.gz

Jim Breen
The Electronic Dictionary Research and Development Group.
December 2013
June 2014

Information about the formal usage arrangements for ENAMDICT can be found on the Group's WWW page.

APPENDIX

ENAMDICT COPYRIGHT STATEMENT

In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group.

Information about the formal usage arrangement for ENAMDICT can be found on the Group's WWW page. (http://www.edrdg.org/)

In summary, ENAMDICT can be freely used provided satisfactory acknowledgement is made, and a number of other conditions are met.