[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] (vs) words in the Tanaka Corpus



[Paul Blay (Re: [edict-jmdict] (vs) words in the Tanaka Corpus) writes:]
JB>> > Would you drop the 為る(する)? Or could you have:
JB>> >
JB>> > 野宿[2]{野宿しました} 為る(する){}   ?
JB>> >
JB>> > There is a downside to dropping the 為る(する). At present the
JB>> > indices (as you have extended them) cover all the morphemes in
JB>> > the sentence. If what you propose actually drops the する from
JB>> > the index list, the sentences could be less useful for some
JB>> > future NLP task.

>> This is related to the indexing of (exp)'s.  Every time I
>> index something to になると it ceases to be indexed to
>> に 成る and と.
>> 
>> However the case of (vs) does have the advantage that it could
>> be fully reverse engineered.  Unlike となると (exp),
>> "野宿 (2) (vs)" gives you all the information you need to
>> recognize 野宿[2]{野宿しました} as containing する and,
>> given a list of する (and possibly できる) conjugations,
>> split it into its component parts.

OK. Of course the reverse-engineering would have to be done using the
dictionary as a reference tool. Anyway, that's not really our/my
problem.

>> If I was going to be perfectionist I might suggest that (vs)
>> glosses should not be a separate sense in the same way as
>> other senses are handled but more of a "same number, different
>> POS" system.  I recognize that this would render the database
>> structure more complicated though so I don't intend to push
>> the idea much ;-)

I think I raised that before. Ideally there should be the ability to
have several POSs within a sense, each with its own own set of glosses.
At present it's just too messy with my utilities, but when we get to the
Real Database, that can be revisited.

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学