[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Japanese Wordnet and WWWJDIC
2009/5/25 Jim Breen <jimbreen@gmail.com>:
> The number of JWordNet "words" potentially linked from WWWJDIC is about
> 79,000. I say "potentially" because that is the size of the file I extracted.
> Not all are in JMdict/EDICT. From a quick inspection I'd say the intersection
> was about 90%. Also jwordnet has entries like あいまいに, whereas JMdict
> only has あいまい, etc. etc.
>
> I'll see if I can produce a diff file. It might make a useful list of
> target new entries.
That 90% was optimistic. It's about 60%. ~52,000 match and ~27,000 don't.
Quite a few of the non-matches are orthographic alternatives, e.g.
あそび女 for 遊び女 and あき場所 for 秋場所.
I have looked at the 27,000, and about 11,000 are in the "wipfile"
(WI) collection.
E.g.
躓き [つまずき] /stumbling/failure/WI1/
鉤頭虫類 [こうとうちゅうるい] /Acanthocephala/WI1/
...
Since they are real enough words for Francis et al. to put in jwordnet, they
probably should be considered for early inclusion in JMdict. I'll see about
putting a list in an accessible place.
Jim
--
Jim Breen
Adjunct Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/