[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Examples file duplicate id numbers



On 5 April 2010 11:56, Francis Bond <bond@ieee.org> wrote:
> I would prefer that we use a combination of numbers
> jpn-id:eng-id so that every pair gets a unique ID.

Within the WWWJDIC system I could use a combination like that.
The CSV dump I download includes the IDs of both the Japanese
and English sentences. I chose to put the English one in the file,
as 90%+ of corrections from WWWJDIC users are for the English.
I could put a pair of sequence numbers in the file and also allow
a choice of whether to link to the English or the Japanese.

None of this changes how thing happen within the Tatoeba project,
but it would make a unique ID in the file used by WWWJDIC

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne