[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] [rare] tag for obscure kanji?



I am one of the Kyoto people.

We have newer (and larger) 10B web corpus and I can compute n-gram statistics for it.
The thing is I am not sure how to match JMDict entries with JUMAN/Juman++ analysis results.

JMDict is not a morphological analysis dictionary as Unidic/Jumandic/IPAdic are and there are multiple many-to-many matchings possible. Phrases are problematic as well. We also don't really do disambiguation of hiragana.