[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Romanization of names



I want to raise the potential of cleaning up some rather clunky
romanization in the JMnedict/enamdict names dictionary.

Some background - when I was first building the names
dictionary I scraped a vast number of names from various sources
and generated romanized forms from the kana. Since I was keen to
get round-trip issues under control, I used ワープロローマ字, i.e.
"you" for よう, "kyou" for きょう, etc. I also used dzu for づ to
distinguish it from ず, which got "zu".  This really has become a
non-issue and with Unicode becoming the norm and allowing the more
regular yō, etc. versions, I'm getting requests to fix up the file and
make the romanization more what people expect to see.

Doing all this by hand is out of the question. A quick inspection of
the おう/こう/そう/とう/のう/etc. instances indicates there are over
110k entries to change. Fortunately we have a bulk updater utility
that can be fed files of changes, and it seems to do the trick (see
https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=5103937)

At this stage it's just a heads-up, as there's quite a bit of preprocessing
to be done, but if anyone has any objections, suggestions, comments,
etc. they'll be welcome. Frankly I find entries like:
饅頭屋町 【まんじゅうやちょう】 (p) Manjuuyachou
a bit of an embarrassment, and I'd prefer:
饅頭屋町 【まんじゅうやちょう】 (p) Manjūyachō

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/