[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Furigana algorithm, continued




On Aug 15, 2007, at 12:41 PM, Oukoulele wrote:

Not to demean anybody's work, but surely other people have tackled
this before?
Including Vincent as I've linked above. Unless you meant with 100%
accuracy reasults,
even then it depends on what one considers "accurate" results.


I cannot see what great value there is in having incomplete and inaccurate results... and equating that with success.  The difference between being able to parse 90% or 100% is that instead of writing, or your case copying, some 40 lines of code in 30 minutes, you have to invest real time and real energy and push forward to completion.  Equating the two efforts strikes me as nonsense.



But I guess we have different goals. For my purposes, for the
learner's purpose,
I see only the regular, documented, On and Kun (and maybe nanori)
readings as
valuable to the learner when displayed in a character-by-character
(splitted) fashion.


You can't see a value to knowing that 今日[きょう]  is not read in the standard way?  There's no value in teaching idiomatic readings, and that sometimes groups of characters own a sound instead of individual kanji?  Or are you limiting what you think is valuable to teach to what you can accomplish with a minimum amount of personal effort?


Vincent's code will match it as a On-Kun, but it could be a On-On
reading as well.
I see you have no indication of the on/kun status in your file, but
for me this
is important. I want to be able to tell the learner, for example,
these are all
the Chinese readings, or Japanese readings that are covered in your
vocabulary.
In this case it seems there is not enough data in JMDICT to infer
which is the
proper reading for the second kanji.


When you have two possible ways to parse a word, you choose the method where both readings are either on or kun... you don't mix them.  That's the correct reading and the correct yomigana.