[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Database project



On 09/08/07, salvati marc <salvati_marc@yahoo.fr> wrote:

>  >That sounds as though you may have lost some of the current information.
>   >The restriction information in JMdict is quite complicated, because
>   >Japanese orthography is complicated. It's one of the reasons I did my
>   >own DTD rather than try and use something like the TEI one, as no-one else
>   >was catering for the concept of an entry having both kanji and reading
>  forms
>   >with the potential for more than one of each, and the possiblilty that not
>   >all combinations were legal.
>
>  I thought a lot about it when I read the DTD. But in the end, I think
>  that for each sense, (not each entry), there is a list of corresponding
>  reading and writing that are valid, with maybe relation between
>  reading-writing.

There is ALWAYS a relationship between the "writing" (if any), i.e. the kanji
part, and the reading.

>  I think that combination of  stagk, stagr, re_restr etc can be
>  interpreted as a list of valid readings and writings at the sense level.

The way it is held at present in JMdict provides enough information
to prune the illegal combinations from the list of all possible kanji/reading
pairs. It also make the generation of human-readable version easier, and it
simplifies the data entry (since only a small proportion of entries have
restrictions.)

The stagk/stagr then allows the senses to be either pruned (e.g. when EDICT is
generated) or tagged (EDICT2).

Jim

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/