[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Offtopic: frequency information Japanese words



They do not seem to have cleaned up the list, though:
differently inflected forms of the same word appear as
separate entries.

Well the've gone a half-way house.  They strip (most of) the end
off inflected forms of words.

For example

なる <- dictionary
なっ <- なった, なって, なったり etc.
なり <- なります, なりません, なりませんでした etc.

It also seems that kanji, Hiragana and Katakana have different
entries. I don't know how they chose the sources but it seems
remarkably short on some words that I would have thought would
be fairly common (犬).