[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
JMdict database moves
Greetings,
I want to fill people in on where we are with moving
to an online database for JMdict/EDICT, etc.
We are getting close to the start of a period where I'll
be doing parallel updates of my system and Stuart's
database and comparing the resulting dictionary files
to see if there are any problems. In preparation to
this:
(a) Stuart has developed routines that take the
new/amend form outputs from WWWJDIC and
convert them into (effectively) loaded up edit
screens for the new system. This will greatly
reduce the parallel running load on me.
(b) I have developed a program that turns the
JMdict xml file back into my internal format. I
can do complete error-free round-trips with it.
This will enable us to continue creating the
edict/edict2/Mac xml/etc. formats without
doing anything extra. It also provides a fall-back
if there is a disaster with the database.
Once we're happy with the parallel running,
I plan to turn off my (manual) updates at
Monash and just update via the database,
stripping the data out daily to generate the
dictionaries. And once that's stable the big
final step will be to direct the new/amend
forms in WWWJDIC to go straight to the
database, with users doing their own thing
there.
There's still quite a bit of work to do, and
Stuart has family commitments which will
limit his availability mid-year. I really hope
that by the end of 2010 the management
of the main dictionary (and perhaps
enamdict too) will be totally different.
My thanks to Stuart for the massive development
task. JMdict is not a simple database structure.
And thanks to William Maton for the huge task
in bringing arakawa.edrdg.org up on its new
host. It wasn't a trivial task, but it's running very
well, and the new host is rather faster (and
~50% the price) of the old one.
Cheers
jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne