[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] ID numbers in Edict
On 11 April 2010 17:55, Paul Blay <blay.paul@googlemail.com> wrote:
> I think I've suggested this before, but how about including the unique
> Entry ID numbers in the Edict2 file? (Or having an Edict3 file if
> necessary)
>
> I think there would be a good 'market' for that addition from all
> those who up to coping with XML but want an easier way of matching up
> Edict Entries between updates than is possible at present.
I could easily do this, as it's currently an option in the utility that makes
the edict2 format. At present it can do it one of two ways:
- simply dumps the sequence number at the end of the file as though it were
a meaning, e.g. 漢字 [かんじ] /(n) kanji/1001000/
- flags it with "EntL", e.g. 漢字 [かんじ] /(n) kanji/EntL1001000/ (this is the form
used by WWWJDIC.)
Neither is exactly what a developer or user wants to be hit with unannounced,
but I don't really want to make an "edict3" at this stage.
Maybe I could simply pop the number in: 漢字 [かんじ] /(n) kanji/#1001000/
and deal with the flak (if any). That "#" indicates it's some sort of sequence
number. I must say I have no idea who uses the "edict2" file, or for what.
Comments, anyone?
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne