[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Japanese Wordnet and WWWJDIC



Jean-Luc raised with me the question of putting (j)wordnet synset codes
into JMdict.

The links from WWWJDIC to the online jwordnet are done using the data in
the file "wnjpn-0.9.tab.gz", which gas the following:

....
00001837-r	西暦	multi
00002098-a	できない	hand
00002142-r	紀元前	multi
00002296-r	紀元前	multi
00002325-v	呼吸	multi
00002527-a	腹側	multi
....

I stripped out the middle column, and simply create a link if a
headword matches.

Assuming that it's a Good Thing to have synsets identified in
JMdict, it could be done two ways:

(a) embed them in the source database;
(b) add them on the fly from the file above at the time
JMdict is built.

I am more inclined to add them at build time. I suspect that
synset allocations are not set in stone, and that building them
into the database would mean that changes within jwordnet would
have to be tracked and matched.

Anyway, what are people's views? Is it worth adding synset codes?
Is that the best way of doing it?

Cheers

Jim

-- 
Jim Breen
Adjunct Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/