On Aug 17, 2007, at 12:36 PM, Paul Blay wrote: > You could use a method of checking to see if a longer, more precise Goal is to parse this word out of a sentence: 得になる (exp) to do (a person) good; to bring profit But instead you currently parse: 得{とく} (adj-na,n,vs) profit; gain; interest に (prt) indicates such things as location of person or thing, location of short-term action, etc. なる(為る) (v5r) to change; to be of use; to reach to OR なる(生る) (v5r) to bear fruit OR なる(成る) (v5r) (uk) to become So on first pass, you join adjacent B line glosses check for JMDICT entries: 得に but you strike out. But because ni is kana, you take a chance and add the next parsed word 得になる (exp) to do (a person) good; to bring profit Your program now erases three glosses from the B line and adds one new one. Extending string theory could be used, and a genetic algorithm could use readings instead of kanji or vice ve
sa for each test mutant. Some arbitrary number of bins could be set as max bin size. Of course this will require that you dedicate a computer for perhaps a few days to see how many it can dig out, but once its done its done, and you currently have no method of knowing how many TC lines are affected by this.
Not perfectly because some words will have inflections. You write a short set of instructions and walk away from it. Difficulty? > and I would also think that the longer a compound is, the more No Paul, longer implies NOUN... compounds...long ones, tend to be names of things, and not verbs. Therefore the longer a word is, the more likely it is not to have inflected forms, and thus be the same as its dictionary version. |