[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Re: P words in EDICT
[collin372 ([edict-jmdict] Re: P words in EDICT) writes:]
>>
JB >I am interested in filling in gaps in this class of words,
JB >and hence invite people to nominate words which are not
JB >currently on the (P) list, but which could be considered
JB >for inclusion.
>>
>> I've been finding a problem with words marked P that I'm
>> pretty certain shouldn't be. I recently wrote a little
>> program to take the edict file and a list of kanji I know
>> I can write, and spit me out a list of kanji words I
>> should be able to write.
>>
>> Just in the first few pages I ended up running across a number
>> of words that were actually pretty uncommon, as confirmed both
>> by trying to track them down (i.e. Eijiro, Google, etc.), and
>> confirming with my wife (an NSoJ).
>>
>> In addition to finding new ones for P, it might be good to
>> solicit nominations on removals. In any case, I'll try to
>> pass them along as I can find them.
I am removing quite a few too. One of the sources I used
was "Ichimango goi bunruishuu" published in 1998 by Senmon Kyouiku
Publishing. A wordlist from that book slid of a passing truck. They are
marked as "ichi1" in JMdict and result in a P. When I come across
one which is not all that common, I change the tag to "ichi2".
Another source was a commercial, erm, wordlist which I won't mention
by name. It contains quite a few doubtful ones, so I dropped all
which weren't in the first 24,000 in the newspaper list.
By all means tell me about ones to drop.
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学