[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Re: Adding old kanji forms with a script



On 29 June 2012 07:09, scott.edict <scottn.canada@gmail.com> wrote:
> I'm reviving an old thread here, but I was reminded of this by Rene's
> post concerning Jean-Luc's extraordinary scripting abilities!
>
> Is there still some interest in developing a script to add old kanji
> into Edict? (e.g. automatically adding é ©ç•¶ (tekitou) to
> é ©å½" (tekitou))

I wouldn't mind seeing entries extended by adding variants using
旧字/異体字 provided they actually  have some currency out there.
In other words I'd like to see some evidence that they were in use in
Japanese texts somewhere, and not just constructs. I note that
many entries in Buddhdic use old kanji.

As for adding them, we already have a bulk-update program from
Stuart which we used a year or two ago to add POS tags to
expressions. It could  be modified to add other fields (Stuart
wrote it in a quite general way with this intention.)

Anyway, let's see if a sound list of variants is forthcoming. Something
in a format like:

seq-no[tab]current-kanji[tab]additional-kanji would be useful for
starters. E.g.

1586050   合気道   合氣道
#In Daijr, Koj, etc. 330k hits
.....

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne