|
I also use re_pri, just didn't remember
all the details. If there are both ke_pri and re_pri tags, I
average them, and then increase that value with the frequency of
kanji. For words with no kanji, the only difference is that there
is no extra increase from kanji. These words still come up high
enough in the search results so I never felt that I have to alter
the algorithm because of this.
By the way I just checked my data conversion script, and it only uses the kanji with the lowest frequency number. If there is a word with different forms, the one with the most common kanji will have a higher frequency. Also, if a word has neither ke_pri nor re_pri, the kanji doesn't count and the word gets 0 frequency. I can't stress enough though, that this is only my method, and there can be infinite others that work well or better than mine. zkanji only runs on PC so I don't know whether this method could be used in phones and tablets, but it's fast enough to show all search results real-time while typing. (Having the whole dictionary data in memory helps.) At first it was slow of course, but only because it looked for the search string's position in the results' meanings at every comparison. The current version does it before the sort, and only numbers are compared. This simple optimization was enough to speed up the sort. The problem with only using precomputed numbers from your data is that the current search string won't influence the order. It is possible (and before I completed my algorithm happened a lot) that when you look for some word, you will first get 10-30 not very interesting results, simply because they have higher frequency. You could probably store the length of meanings and only use that, but you will have to experiment with this yourself.
|