[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] pos/misc entities



2008/6/12 Jean-Luc Leger <reiga@dspnet.fr.eu.org>:

> A real solution would be to add a new element around senses, because that's
> what we mostly do with those POSes spreading from sense to sense : we group
> senses.
> But the expected answer to this is : wait till the Master File has been
> moved
> into a Database.

なるほど. That's what many dictionaries do, but as you say, I
can't really consider it in the present rig.

> Yes and I would rather have data changed in the master file than modifying
> generators (the edict generator is now almost correct though some parts
> concerning POS and restrictions are so tricky they could easily become buggy
> again ..)

Yes, the EDICT generation is the tricky one. Making JMdict is simple,
because it's
mostly line-by-line formatting through a simple state machine. (And making
KANJIDIC2.xml is even simpler, because I designed the internal format with
the FSM in mind.) With EDICT generation more exceptions have to be considered,
e.g. deleting senses if there are restrictions, generating kanji/kana
combinations, etc. I don't like to touch it too much  as it's easy to break.

Jim

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/