I completely agree with these points. IMO, JMdict is really targeted at developers. Consumers of Jmdict aren't humans, they are mostly computer algorithms. Software developers take Jmdict and turn the machine readable bits into apps. They mostly throw away (or display to english speaking users) the human readable bits. 99.99% of Jmdict consumers do it through these apps. This is why, I personnaly would love every move that would make jmdict significantly more easily read by machines <rant> This is also why I am a bit concerned by the way i18n is handled/hispadic is merged for jmdict : IMO, Dutch/German/hispadic/french should be merged once, with aligned senses whenever it is possible (using pos, I had some success desambiguating with this), and then human editors correction could happen to improve the data accuracy/coverage with time And as I have understood it, hispadic is merged every day (without using pos)... which means that the spanish senses are put (lost) in the first sense english. The issues I have with that are : No sense alignement (-> my app breaks for spanish users : I need sense alignement for glossing texts) The more time passes, the more jmdict wanders away from hispadic (the pos of jmdict changes while those of hispadic are frozen, aligning senses through pos becomes harder/fails more often) No chance for human editors/checks to improve the spanish data </rant> Also, I have been computing glosses to jmdict for other languages through wordnets (many thanks to the open multilingual wordnets) with some (good enough for my Apps/users but never excellent) success rate http://www.spartan-entertainment.com/android/languageSupport.html and the human readable entry with "lit:" or "(lit)" or "literally" in front of the gloss text, or with human readable precision put between parenthesis at the end are making this much harder I would much rather have a xml attribute or tag providing such info in a machine readable form, avoiding filtering/processing/painfull disambiguation. Olivier
|