[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Re: P words in EDICT



[collin372 ([edict-jmdict] Re: P words in EDICT) writes:]
>> 
JB >I am interested in filling in gaps in this class of words,
JB >and hence invite people to nominate words which are not 
JB >currently on the (P) list, but which could be considered
JB >for inclusion.
>> 
>> I've been finding a problem with words marked P that I'm
>> pretty certain shouldn't be.  I recently wrote a little
>> program to take the edict file and a list of kanji I know
>> I can write, and spit me out a list of kanji words I
>> should be able to write.
>> 
>> Just in the first few pages I ended up running across a number
>> of words that were actually pretty uncommon, as confirmed both
>> by trying to track them down (i.e. Eijiro, Google, etc.), and 
>> confirming with my wife (an NSoJ).
>> 
>> In addition to finding new ones for P, it might be good to
>> solicit nominations on removals.  In any case, I'll try to
>> pass them along as I can find them.

I am removing quite a few too. One of the sources I used
was "Ichimango goi bunruishuu" published in 1998 by Senmon Kyouiku
Publishing. A wordlist from that book slid of a passing truck. They are
marked as "ichi1" in JMdict and result in a P. When I come across
one which is not all that common, I change the tag to "ichi2".

Another source was a commercial, erm, wordlist which I won't mention
by name. It contains quite a few doubtful ones, so I dropped all
which weren't in the first 24,000 in the newspaper list.

By all means tell me about ones to drop.

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学