On Jul 23, 2010, at 5:19 PM, Jim Breen wrote: I want to (re)visit the ordering of kana parts (Jim Rose My first concern would be how is the data interpreted outside of this group. Sure we know to output the katakana first when its marked [uk], but would the typical user of the file who is not a party to these discussions? My understanding is that they will read something in the documentation of the structure of the dictionary and determine that the readings are ordered by frequency. But alas, we now say they are ordered by frequency UNLESS a kanji compound is present. Therefore we violate our own ordering rule over the arbitrary discovery of a kanji compound, give arbitrary and misleading primacy to readings of the kanji, but we don't say WHY... which is tan
amount to saying that we give arbitrary primacy to words written in kanji regardless of their actual use - is it because we love kanji? Do we feel antipathy toward katakana? I don't get it. So the work around to do it hiragana first in the event of kanji is to make the documentation explicitly note that we are creating an exception to our own principle. Justifying that exception is where I think this gets nebulous. That is because I don't understand why this is "structurally cleaner". I consider it structurally convoluted, though historically entrenched -> long run view vs short run. As far as I can tell we would be creating this exception to ordering simply because that's how it was done in the past, by arbitrary rules that had no meaningful basis. Is that a valid reason to continue doing so? Perhaps if we could get an estimate of how many taxonomic entries exist in this format now, and compare that to the ultimate number that could eventually exist in JMDICT we might see that we are incorporating a structural flaw in organization due to having a short run view of the data set. My second concern is what if any affect this has on the EDICT legacy file, since that is, and probably will remain for some time, the principal file being incorporated into the world's various software platforms... those already here and those being born in the near future. Is there an effect at all? Maybe not. But we should ponder that. Either way, we need to set the rule now and hope that we pick the right direction. Whatever is decided I'm behind 100% after its decided. Until then I'm for Katakana first within the scope of taxonomic entries. Jim R |