KANJIDIC Project: Difference between revisions
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
=The KANJIDIC Project= | =The KANJIDIC Project= | ||
The KANJIDIC project has | ''(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)'' | ||
==Introduction== | |||
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards: | |||
* [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji. | * [https://en.wikipedia.org/wiki/JIS_X_0208 JIS X 0208-1998], which includes 6,355 kanji. | ||
* [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji | * [https://en.wikipedia.org/wiki/JIS_X_0212 JIS X 0212-1990], which includes extra 5,801 kanji | ||
* [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds additional kanji. | * [https://en.wikipedia.org/wiki/JIS_X_0213 JIS X 0213-2012], which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji. | ||
Three | Three data files are distributed by this project: | ||
* the KANJIDIC2 file, which is in XML format, and contains all | * the KANJIDIC2 file, which is in XML format and [https://en.wikipedia.org/wiki/UTF-8 Unicode/UTF-8] coding, and contains information about all 13,108 kanji. For this file the following information is available: | ||
** a project [http://www.edrdg.org/kanjidic/kanjd2index.html overview page] | ** a project [http://www.edrdg.org/kanjidic/kanjd2index.html overview page] | ||
** a file [http://www.edrdg.org/kanjidic/kanjidic2_ov.html overview] | ** a file [http://www.edrdg.org/kanjidic/kanjidic2_ov.html overview] | ||
** the [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] | ** the [http://www.edrdg.org/kanjidic/kanjidic2_dtdh.html DTD] | ||
** a [http://www.edrdg.org/kanjidic/kd2examph.html sample entry] | ** a [http://www.edrdg.org/kanjidic/kd2examph.html sample entry] | ||
* the KANJIDIC file, which covers the 6,355 kanji in JIS X 0208. For this there is the | * the KANJIDIC file, which in in EUC-JP coding and covers the 6,355 kanji in JIS X 0208. For this there is the | ||
** [http://www.edrdg.org/kanjidic/kanjidic_doc.html original documentation] | ** [http://www.edrdg.org/kanjidic/kanjidic_doc.html original documentation] | ||
* the KANJD212 file, which covers the 5,801 kanji in JIS X 0212. | * the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. For this there is the | ||
** [http://www.edrdg.org/kanjidic/kanjd212_doc.html original documentation] | ** [http://www.edrdg.org/kanjidic/kanjd212_doc.html original documentation] | ||
There is also a [http://www.edrdg.org/kanjidic/kanjidic.html combined overview] of the KANJIDIC/KANJD212 files. | There is also a [http://www.edrdg.org/kanjidic/kanjidic.html combined overview] of the KANJIDIC/KANJD212 files. |
Revision as of 02:08, 6 September 2018
The KANJIDIC Project
(Note that this page in the process of being rewritten, so be patient with any aspects that seems incomplete.)
Introduction
The KANJIDIC project, which began in 1991, has the goal of compiling and distributing comprehensive information on the kanji used in Japanese text processing. It covers the 13,108 kanji in three main Japanese standards:
- JIS X 0208-1998, which includes 6,355 kanji.
- JIS X 0212-1990, which includes extra 5,801 kanji
- JIS X 0213-2012, which extends JIS X 0208, overlaps with some of JIS X 0212, and adds 952 additional kanji.
Three data files are distributed by this project:
- the KANJIDIC2 file, which is in XML format and Unicode/UTF-8 coding, and contains information about all 13,108 kanji. For this file the following information is available:
- a project overview page
- a file overview
- the DTD
- a sample entry
- the KANJIDIC file, which in in EUC-JP coding and covers the 6,355 kanji in JIS X 0208. For this there is the
- the KANJD212 file, which also is in EUC-JP coding and covers the 5,801 kanji in JIS X 0212. For this there is the
There is also a combined overview of the KANJIDIC/KANJD212 files.