KANJIDIC2 HOME PAGE

Introduction

Welcome to the home page of the KANJIDIC2 project.

The files of this project are copyrighted by the Electronic Dictionary Research and Development Group and are available under the Group's licence.

The KANJIDIC2 project has as its aim the production of a consolidated XML-format kanji database combining the information currently in the KANJIDIC (6,355 kanji from JIS X 0208) and KANJD212 (5,801 kanji from JIS X 0212) files (overview) (documentation), and adding information about the additional 952 kanji in JIS X 0213. (2,743 kanji are in both JIS X 0212 and JIS X 0213.)

Why do this? Well, XML is a great format for distributing data because many database packages can import files using XML. Also a growing number of software tools can handle XML. Many people want to use the data in KANJIDIC etc. but have trouble handling its format. In addition, I want to take advantage of the much richer data structure available in XML to add additional information and features to the database.

As with the JMdict project, an internal format is used for storage and editing, and the XML version is generated from that, as will the original KANJIDIC/KANJD212 files.

The main documentation is in the form of comments in the DTD, however an overview is available. Information about what has changed in each release is in the What's New page.

The Files

Currently available are:

  1. the KANJIDIC2 overview
  2. the current DTD (HTML) (.gz)
  3. the XSD schema created by Jan Eichhorn from the DTD;
  4. an alternative XSD schema also created by Jan Eichhorn, which uses xml schema restrictions and enumerations where possible;
  5. an example of an entry for one kanji.
  6. the current version of kanjidic2.xml (.gz)
  7. the documentation of the kanjidic and kanjd212 files, which contain detailed descriptions of the data, history, etc.

General

The KANJIDIC2 file was officially released several years ago, and is now relatively stable, although the data in it is updated from time to time. Also additional fields are added as relevant data becomes available. Don't assume anything is set in concrete if you use the file in a project.

One major change, which will only be implemented gradually as it will require a lot of manual work, is to group the readings and the matching meanings.

The structure allows for meanings to be in more than one language. There are sources of material in French, Portuguese and Spanish, which it will be good to add.

Comments to Jim.

Jim Breen
April 2008
August 2009
May 2012