|
Hello and many thanks for the fast and
accurate answers
On 10/21/2013 03:19 AM, Jim Breen wrote:
I see, this helps !
If something may be done programmatically to fix/improve this. I may help but if it requires expertise in kanji/Japanese I wouldn't be able to s I don't have it. :/
I see. I'll try to propose a fitting sense (again, I'm quite good at technical skills... but I'm not that good at artistic/stylistic/litterary related decisions, so this will have to be edited by someone more competent) I see. This helps. I'll look into those directions. I had the same kind of advice from the author of the j.depP library (a C++ japanese dependency parser). He told me to consider using juman. I have no idea how the different japanese dictionaries compare in quality/accuracy (Ipadic, juman, naist, unidic, ...) I have been using ipadic because the Kuromoji parser that ship with lucene comes out of the box with it has a lot less memory requirements than mecab (everybody fits in under 10MB when the files of mecab take like 90MB) Using juman or unidic with the kuromoji parser should definitely be possible but would require me to adapt the kuromoji Parser to those files ( heavy optimizing in there to reduce memory consumption and redundancy) and I haven't seriously looked into it and thrown time at it yet. Would you happen to know which one of the two is (juman, unidic) is the best/most accurate/maintained ?
Didn't know that. I'll look into those links.
Again, thanks for you time and the (very helpful) answers. I had some more questions : I might be able to help by contributing grouped reading-meaning to kanjidic but I don't know what would be acceptable (regarding the license/copyright) 1) Is it acceptable if I look into a published kanji-dictionnary to help me decide which on/kun reading should be associated by which english meaning ? If not as grouping reading with meaning only requires shuffling glosses around the kanjidic2 file inside the right rmgroup tags (I wouldn't write senses/glosses...nor copy/paste stuff from copyrighted works) 2) Is it acceptable to use algorithm/nlp techniques/jmdict to help group meanings to readings ? 3) grouping readings and meanings would probably take a lot of time... what if, at first only the english meanings (and the french ones) are grouped ? or if the english meanings were slowly but steadily grouped with readings ? say first 1%, then 2%, then 3%... Would those updates be daily published (which would allow other people to edit/contribute) or would you want for all english meanings to be grouped before they are pushed to a public release ? 4) what if I programmatically, built a html5 page, backed by a database and the right functions, that would allow many people to group meaning to reading, in an efficient way ? Would it help ? Best regards, Olivier
|