|
On 12/03/2013 07:30 AM, J Greely wrote:
Indeed, that is the first thing I had to do too. Having the same data without the annoying expension would greatly simplify things for developers/apps...and help save the planet. (you have to first parse JMDict to get the entities, optionnaly sorting them in the field/diag/pos/misc categories (I do) build a map to unexpand them and then parse JMDict again for the real stuff) Took me at least a day to find the best way to do that in Java with a XmlPullParser (and that was my second try : at first, I manually wrote my own kanjidict/JMdict parsers to do just that) I don't mind how they are coded as long as it makes sense and they are easy to parse like for example <pos>n</pos> or <pos type="n"/> I love the pos/misc/field/dial in JMdict and have a use for most of them. I would just love having more significant usefull metadata. To me, the best metadata tags are the one that can be understood/used by a program like stagk, stagr, re_nokanji?,re_restr, re_pri... tags that can only be used/understood by humans like re_inf or s_inf are great for humans that are learning japanese but not so great for developers because, appart from displaying those on screen, what can you do programmatically about them ? There could be anything in those tags ! . Also, I would love having more languages in JMdict... The second thing I'm doing when parsing kanjidict and JMdict is adding russian/german/spanish meanings.... (I would love to add chinese...there would be like 1 Milliard people that could then benefit from JMdict/Kanjidic...but well...haven't ) Keep up the good work. JMdict and Kanjidic rock ! Olivier
|