[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Merging the JIS208/JIS212 entries.



Greetings,

As many will know I have kept a small (~200) set of entries
apart from the rest because they have JIS X 02012 kanji in them.
For legacy reasons the original edict file can't really have
these kanji (the file can, but it will make a lot of software
crash).

Over the next few days, I will be migrating these entries into
the main file. In some cases it will mean just adding an extra
kanji string to the main entry, e.g.
偽る 【いつわる】 (v5r,vi) to lie; to cheat; to falsify; to deceive; to pretend
譃わる 【いつわる】 (v5r,vi) to lie; to cheat; to falsify; to deceive; to pretend
will become:
偽る;譃わる 【いつわる】 (v5r,vi) to lie; to cheat; to falsify; to deceive; to pretend

In a few cases, e.g.
川魣 【かわかます; カワカマス】 (n) (uk) pike (esp. the Amur pike, Esox reichertii); pickerel
a complete new entry will move across.

The approach I am taking in generating the EDICT file is
to drop completely any kanji field that contains a JIS212
kanji. In a few cases that may end up with a kana-only
entry. Where appropriate I'll construct kanji headwords
with kana replacing the uncommon kanji, e.g. for
the 川魣 entry above, I'll make it 川魣;川かます(which
is sometimes used too.)

Just letting people know. If there are any problems,
tell me.

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne