[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Conjugations and PoS tags for だ, くれる
On 07/20/2014 03:19 AM, Jim Breen jimbreen@gmail.com [edict-jmdict] wrote:
>
> Catching up on emails that arrived when I was in Indonesia. These POS
> issues really need to be settled one way or another. I've been
> looking at Rene's comments (below) and Stuart's following detailed
> response.
>
> On 3 July 2014 05:35, René Malenfant rene_malenfant@hotmail.com
> [edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:
>
>> 1) The entry for くれる is marked more clearly now for human usage
>> than it would be if the note were removed and the PoS changed to an
>> obscure PoS tag. くれろ used to be common, and it still gets used, so
>> I think having it in the conjugation table with a usage note on the
>> entry (as at present) makes more sense.
>
> As I said earlier I'm a bit uncomfortable with having a special POS
> just for one verb with an odd imperative. No-one else handles it that
> way (the 大辞林 entry begins: "命令形は「くれ」が普通", but has the regular POS
> markup.)
>
> I take the point that it would make table-driven conjugation easier,
> but it's a major call to say that くれろ is irregular because one mood
> form is commonly different from the norm in modern usage. I really
> think the current note is enough, as in 大辞林.
It is not so much that it makes table-driven conjugation simpler,
it extracts and isolates a particular bit of information and thus
makes any machine processing simpler. That is one of the major
goals/benefits of organizing information in a structured form
like a database or xml.
I think your position (and 大辞林's) is perfectly reasonable when
the information is seen as being for presentation to a human
consumer, as-is. That is obviously the case with 大辞林, arguably
with Edict, but should it be the case with JMdict?
If an information source is also to be used for machine processing,
a human-friendly form that requires understanding a note becomes
sub-optimal. Another example is allowing glosses to have a 'lit'
tag rather than just throwing a "lit:" or "(lit)" or "literally"
in front of the gloss text. Another reason the former is preferable
to the latter is it is easier/simpler to generate the latter form
from the former than the reverse. In the くれる case it is easy
to present it as 'v1' verb (should you want to) if you are told
it is a 'v1-ik' verb. To go the other way requires code which
currently looks only at the PoS to also be given access to the
kanji, entry seq number or some other auxiliary information in
order to provide special handling to tiny subset (of 1) of 'v1'
entries.
I don't think it is at all a stretch to call くれる irregular, any
more than calling 行く irregular is. Obviously definitions of
"irregular" vary but I think a definition that requires words
of the same "conjugation class" to follow the same conjugation
rules is more useful than a squishy one that says they mostly
follow the same rules except for a few exceptions, whether few=1
or 1000.
The commonly used imperative form of くれる is くれ. This is
different from the imperative form of any other v1 verb. IMO
this information should be captured in a form that is easily
understandable by an algorithm, particularly since a mechanism
to do this (a unique PoS per conjugation type) is already in use
and applies to every other word (with 'aux-v' as a catchall for
"irregular in a way we don't care about capturing")
> [We are bit spoiled by Japanese regularity. I remember when I was
> boning up on French prior to a sabbatical in France in 93/4 I wrote a
> verb conjugator to help me do drills. I gave up eventually, as it
> seemed every second verb that is regarded as regular has an odd twist
> somewhere.]
>
>> 2) いい could be handled simply by splitting it out from the entry
>> for よい and making よい [adj-i] while making いい an [exp] or [unc] with
>> an xref to よい and a note that いい doesn’t inflect. I don’t think an
>> additional PoS is needed and if one is added, it definitely
>> shouldn’t include よい; よい is just a regular old i-adjective. And いい
>> is just a modified version of よい that has no inflections of its
>> own, so I think it would be wrong to say that it has its own PoS
>> with its own inflection pattern that includes よくない, etc.; those
>> forms belong to the regular adjective よい.
>
> I found this suggestion, splitting the いい and よい into different
> entries, a bit radical to start with, but as I have thought it over,
> it's gained appeal. Part of the appeal is that we have a heap of
> entries with structures like XX[の|が][よい|いい], and they are rather
> messy with all the restrictions to line the kanji surface forms with
> the readings. I just added a lot more because I noticed that quite a
> few had crept in the noun tags, and as I corrected them I also added
> a lot of ...[よい|いい], forms. Splitting would certainly result in much
> cleaner entries, indeed I've never been happy with the rather messy い
> い/よい situation.
>
> I notice that GG5, apart from in the 良い entry itself, never writes いい
> as 良い. If we go ahead with a よい/いい split, and I have to say I'm
> tempted, I'm inclined not to use 良い in the kanji form for the いい.
> Thus we'd have entry pairs such as:
>
> 頭のいい [あたまのいい] /(exp,adj-f) (See 頭がいい) bright/intelligent/ 頭の良い;頭のよい
> [あたまのよい] /(exp,adj-i) (See 頭のいい) bright/intelligent/
頭のいい is conjugatable, is it not? (Google turns up a lot of
頭がよかった's). That is not the case with other 'adj-f' words
(eg A級). Why throw away information by taking entries that
currently form a distinct class (that conjugates consistently)
by rolling them into a larger, vaguer class? (This seems like
taking oddly conjugating verbs like する and calling them 'aux-v'
to avoid needing to maintain a 'vs-i' PoS tag.)
And would 'adj-f' also be the PoS tag for いい? Or would there
still be a 'adj-y' (or whatever) tag for いい? If not then one
again needs くれる-like tricks to provide a table of conjugations
for いい -- a word of fundamental interest to Japanese learners.
This seems a step in the wrong direction.
However, if an 'adj-f' tag makes sense in describing the use of
いい, then what about an additional PoS tag to indicate that
this is an 'adj-f' word that follows a specific conjugation
pattern?
>[...]