[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] [rare] tag for obscure kanji?



Darren,

Here are morpheme unigrams for 3B of sentences cut at 10 for the feel what is there at the bottom.
https://tulip.kuee.kyoto-u.ac.jp/ngrams/3B/unigrams.gz

There is of course some dictionary mismatch with JMDict and Jumandic, but sane words would have at least some frequency in 10B sentences corpus.

About unification, there is sure some undercounting, but language is very variative and general tendencies are definitely kept. We compile Japanese case frames from that data and they make very much sense.

Arseny