> For EDICT, and continuing into JMdict, I have tried to > maintain a fairly strict 1-to-1 mapping of kana between the > kanji part of an entry and the kana/reading part. In > particular, I have always made the katakana portion match. > Thus for ローマ字, the reading is ローマじ; not ろーまじ. I don't see any merit in having the reading match script, and I would advocate having them all be normalized to hiragana. They dont sound any different, so its just useless extra information that could just as well be inferred from the headword. Taking it a step further, is there any point in having the reading use the same vowel extensions? Why support both ろーまじ、ろうまじ、and ろおまじ as three separate entries, when they all sound exactly the same. Personally, I would normalize the reading field and search strings so that when searching for any word by how it sounds, all homophones would match regardless of their particular orthography.
I wonder whether you are not confusing different points. Search strings in WWWJDIC (but not strings in the "translate words" function) already pay no heed to hiragana vs katakana differences. Extending that so that ー is viewed as identical to the appropriate long vowel in searches (while not something I am in favour of) would not require or produce changes to the WWWJDIC display or the (old) EDICT and EDICT2 dictionary files. Changing the actual 'reading' for all entries to normalise on full hiragana is problematic on a number of ways. There are semi-anomalous entries (or 'specially distinguished' entries, if you prefer) where some or all of the headword is in the roman alphabet (fullwidth characters). ALS 【エーエルエス】 (n) (abbr) amyotrophic lateral sclerosis (ALS) Changing to hiragana would give the (false) impression that えええるえす might be used to search for this word, besides which えええるえす is just ugly. エーエルエス, on the other hand, can be used to search for relevant sites (although you do a lot better with the roman letters which is why they are given as the headword).