[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Compound verbs
2008/7/11 Hendrik <hiz--dic@islandnet.com>:
> Jim Breen wrote:
>> Since 叩き分ける is not in the database, recognizing it as an entity would
>> not actually achieve much. The usefulness of that recognition would only
>> come when there was a proper gloss for 叩き分ける (hint, hint).
>
> Yes, i thought of that before seeing your hint. :-) But my problem is
> that i don't know many compound verbs well enough to add them to the
> database. I will ask about those on some other lists...!
I've included as many compound verbs as I can find. There are a couple
of books of compound verbs, and years ago I made sure all they listed
were incorporated.
Compound verbs are not that well lexicalized (e.g. 叩き分ける) and the
meanings are not always apparent to a non-native speaker.
>> The weakness at present is that when it gets something like 遊び回りました
>> it will treat it as 遊び + 回る because the longest complete match it can make
>> is 遊び. The verb-deinflection only cuts in when there is an unmatched
>> kanji+kana,
>> kanji+kana+kanji+kana or kanji+kanji+kana pattern. I really must
>> experiment with
>> longer lookahead to see if I can trigger the deinflection even if the
>> first portion of a compound verb matches an entry.
I should have checked properly before I wrote the above. In fact the
check for kanji+kana+kanji+kana in order to trigger a verb analysis is
almost the first thing done.
> Actually, i would be happiest if such a lookup were to proceed along a
> slightly different line, namely such that whenever either of kanji+kana,
> kanji+kana+kanji+kana or kanji+kanji+kana appears, this would
> autmatically _also_ trigger the presentation of the dictionary form of
> the associated verb.
Hmmm. That would be good IF the kanji+kana happened to be the first part of
a compound verb not already in the dictionary. If it was an ordinary case
of the kanji+kana being a participle, prenominal, etc. it would possibly
just make things worse.
> Here is an illustration:
>
> Let me enter 叩き into Word Search (EDICT) - what i get is the following:
>
> 叩き 【はたき】 (n) (feather) duster; (P) [G][GI][S][A][W]
> 叩き(P); 三和土; 敲き 【たたき(P); タタキ(P)】 (n) (1) (叩き, 敲き, タタ
> キ only) (uk) mince (minced meat or fish); (n) (2) (叩き, 敲き, タタキ
> only) (sl) robbery; extortion; (n) (3) (usu. 三和土 (gikun)) hard-packed
> dirt (clay, gravel, etc.) floor; concrete floor; (P)
>
> Now i paste the same item into Word Translation (GLOSSDIC), as per my
> previous example (shown here again), and i get the following:
>
> 指・手のひら・手首で叩き分ける。
>
> * 指 【ゆび】 (n) finger; (P); EP
> * 手のひら 【てのひら】 (n) the palm (of one's hand); (P); EP
> * 手首 【てくび】 (n) wrist; (P); EP
> * 叩き 【たたき】 (n) (1) hard-packed dirt (clay, gravel, etc.)
> floor; concrete floor; (P); EP
> * Possible inflected verb or adjective: (plain verb)
> 分ける 【わける】 (v1,vt) to divide; to separate; to make
> distinctions; to differentiate (between); (P); EP
[Aside: I got a mild panic attack when I saw the above, because
the short-form gloss of 叩き is missing two senses. Something was
wrong with the generation from the database. After much poking around
in the horrible code of the generator, I found it was because senses
1 and 2 of 叩き have restrictions which are a mix of kanji and katakana.
This is the only entry which has this, and it causes the generator to
get confused. I have now partially fixed it.]
> Now let me compare that with the results for 叩く:
>
> Possible inflected verb or adjective: (plain verb)
> 叩く 【たたく】 (v5k,vt) (1) to strike; to clap; to dust; to beat; (2)
> to play drums; (3) to abuse; to flame (e.g. on the Internet); to insult;
> (P); EP (GLOSSDIC)
>
> 叩く(P); 敲く 【たたく(P); はたく】 (v5k,vt) (1) to strike; to clap; to
> dust; to beat; (2) (たたく only) to play drums; (3) (たたく only) to
> abuse; to flame (e.g. on the Internet); to insult; (4) (はたく only) to
> use up money; (P) (EDICT)
>
> Why is that useful? Because the meanings of the verb are usually carried
> over into the noun form, depending on the context, so that with the
> combined results i can form the following image in my mind (based on the
> GLOSSDIC example):
>
> 叩き 【たたき】 (n) hard-packed dirt (clay, gravel, etc.) floor;
> concrete floor; strike; clapping; dusting; beat; beating; drumming;
> abuse; flame (e.g. on the Internet); insult;
>
> Even the non-existing compound verb will be halfways guessable if i am
> shown 叩き and 叩く and 分ける instead of just 叩き and 分ける
The problem is that 叩き is already an entry. Since the text glossing is
intended to present the glosses of the segments of the text, I can't
really ignore the gloss of the 叩き entry and instead present that of the
(possibly related) verb 叩く.
Better solutions are:
- add an entry for 叩き分ける
- expand the gloss of 叩き so that it helps understand situations where it
appears as the first part of compound verbs. Something like:
(4) (in compound verbs) striking, heating, beating
would do.
> In fact, when i use the GLOSSDIC during the preparation of a
> translation, i habitually put all the -i forms in the text through EDICT
> as -u forms to make sure i capture a sufficiently wide range of
> meanings - i do manually what i would like to see the interface do
> automatically, if possible. :-)
I think if that were the standard approach, it would confuse the many
people who use the function to gloss the words. It could conceivably
be part of a "super" version, which is not just glossing the words
in the text, but doing a rather deeper analysis of their components.
By "i-forms" I presume you mean 叩き -> 叩く? Do you do it when the compound
verb is in the dictionary and presented OK, or just when it breaks up the
verb?
Cheers
Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/