[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Future EDICT/JMdict, etc. maintenance system



[Pawel Szymczykowski (Re: [edict-jmdict] Future EDICT/JMdict, etc. maintenance system) writes:]
>> On 9/1/06, Jim Breen <Jim.Breen@infotech.monash.edu.au> wrote:
>> > I had done a mental MySQL design in which one table could completely
>> > hold about 95% of entries. The rest, i.e. entries with many kanji/kana
>> > variants or lots of glosses, would go into overflows.
>> 
>> If you get a chance, could you describe this mental schema in a bit
>> more detail? I think it might help to jumpstart some of the
>> brainstorming on proposed interfaces and how they might relate to the
>> underlying data store.

Well, I had a concept of a basic table as follows:

* entry number
* flag indicating if this entry has been deleted or merged with another
* comment associated with delete/merge
* up to 3 instances of kanji headword, info tags, priority tags
* flag indicating if there are more kanji headwords.
* up to 3 instances of reading text, no-kanji flag, reading restriction,
  reading info and reading priority tags.
* flag indicating if there are more readings
* language,dialect & etymology fields
* up to to 3 instances of sense. Each sense consisting of:
    kanji restriction
    reading restriction
    part-of-speech
    cross-ref(s)
    antonym(s)
    domain/field(s)
    misc info (tags)
    comment field
    up to 5 glosses
* flag indicating if there are more senses

At present about 3% of entries have 2 or more senses marked.

Just my thoughts

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学