[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] adj-i/adj-ix issues



Thanks for the pointer on かっこよさすぎる.

Turns out い and よい aren't really identical in their irregularities. When they take on the そう they behave the same (無さそう、よさそう) but 無い will always be 無さすぎる while よい, actually, is よすぎる, like Alexander pointed out.

(a little irrelevant but よさすぎる seems to at least be considered natural-sounding by some, based on conferring with native speakers and this:
https://oshiete.goo.ne.jp/qa/5365615.html?pg=1 )

If we fully want to account for these irregularities, I guess we should really have two new PoS: either [adj-i-nai] for 無い and [adj-ix] for 良い, or alternatively [adj-i] for both, with additional [adj-i-sasou] tags on both and a [adj-i-sasugiru] on 無い alone. (this is of course a mostly separate issue from the noinfl tag)

くるな is a derivation. な is attaching to the verb くる. There are 
derivations beyond just inflection. A "no inflection" tag doesn't say 
much about exactly what kinds of derivation are forbidden, let alone 
what kinds of inflection, for more r asons than just the fact that the 
line between inflections and compounds is very fuzzy in Japanese.

What inflections are forbidden-in-practice is governed exactly by the 
specific word that is being tagged. "No inflection" assumes that any 
word that might get tagged with it, within a given part of speech, has 
the same kinds of restrictions for what kinds of derivations are not 
allowed. 

OK, I finally get what you're saying, sorry for taking a while. 

If we theoretically had an entry like

kanji: 来る
reading: くる;クル[nokanji]

then arguably the [noinfl] should attach to クル despite the fact that it could supposedly be used with derivatives: "クルな!"
Right. But then you yourself make a distinction between inflections on the one hand and derivatives on the other. 
We have no reason to define [noinfl] to mean "no inflections and no derivatives". Whatever we might name the tag, it'd just be a shorthand. It would might be practical to precise in our definition of the tag and not just define it as "this reading doesn't inflect" but instead as something like "this surface form appears only in 終止形・連体形".

Also, by far the most common form for かっこいい here is かっこよすぎる, the 
form expected of it being an i-adjective. More common than かっこよさすぎる by 
orders of magnitude. Even かっこいすぎる, which is almost certainly a 
corruption, is more common than かっこよさすぎる. Compounds that end in word 
X don't necessarily have the same distribution of derivations as word X 
itself.

Addressing this again in the context of whether or not we should implement a [noinfl] tag: so かっこよすぎる  is actually the prescribed/correct version. But even if "かっこいすぎる" gets more web hits than ”かっこよさすぎる", this doesn't necessarily mean it's more common. Ngram hits/google hits are not the end-all. Ask native speakers and I think it will take you a long time indeed before you find somebody who will accept "かっこいすぎる" or "かっこいいすぎる" as natural-sounding (with possible exceptions for specific dialects). 

This is actually a very good example of why a "no inflection" tag 
wouldn't be useful to tell very much about what kinds of restrictions 
there are on the derivations the word can actually take in practice. 
Exactly what it means for nontrivial derivations would always be 
different from word to word. 

I don't think it is a good example at all. The amount of hits you will get for things like かっこいすぎる/かっこいいすぎる and similar constructions will be non-negligible in the absolute majority of cases and won't be considered natural-sounding Japanese, let alone correct. But besides that, I don't think it's really the dictionary file's job to make sure every single incorrect inflection will get picked up by text glossers - if you really want that level of absolute coverage, I think the onus is on the text glosser's developer to simply throw out the rules and create incorrect matches for various possible conjugations. Just ignore any noinfl tag. Apply v1 inflections to v5r verbs, and vice versa. Yes, you will get a handful of hits even for things like "食べりたい" and "走た" so if you want to cover かっこいいすぎる, you should probably cover them too? 

It's worth noting though that we do include nonstandard inflections that are widespead, like せん/せえへん/しいひん for しない, but I don't think any of かっこいすぎる/かっこいいすぎる/かっこよさすぎる are common enough to warrant inclusion.


On Tue, Jun 12, 2018 at 3:19 PM Alexander Nadeau wareya@********* [edict-jmdict] <edict-jmdict@***************> wrote:
 


On 2018/06/12 1:07, Marcus Richert superbrightfuture@*********
[edict-jmdict] wrote:
> A [noinfl]/[noconj] tag would not be applied to something like くるな,
which is only tagged as "expl" in the database, which isn't a PoS that
inflects/conjugates.

くるな is a derivation. な is attaching to the verb くる. There are
derivations beyond just inflection. A "no inflection" tag doesn't say
much about exactly what kinds of derivation are forbidden, let alone
what kinds of inflection, for more reasons than just the fact that the
line between inflections and compounds is very fuzzy in Japanese.

What inflections are forbidden-in-practice is governed exactly by the
specific word that is being tagged. "No inflection" assumes that any
word that might get tagged with it, within a given part of speech, has
the same kinds of restrictions for what kinds of derivations are not
allowed.

> かっこいいすぎる is not correct Japanese - it'd be "かっこよさすぎる", and
this is precisely the information we're trying to convey with noinfl,
that いい doesn't inflect - only よい does. (and the -さ- in the middle of
it is why we need a special PoS for よい and ない)

It might be proscribed (I don't know), but natives use it on purpose
without correcting themselves. If I deconjugate 食べすぎる then I'm
definitely going to want to deconjugate かっこいいすぎる as well. (This doesn't
mean that I'm displaying かっこいいすぎる as an example of how to conjugate
かっこいい to すぎる, my dictionary doesn't generate conjugations, it only
parses them.)

Also, by far the most common form for かっこいい here is かっこよすぎる, the
form expected of it being an i-adjective. More common than かっこよさすぎる by
orders of magnitude. Even かっこいすぎる, which is almost certainly a
corruption, is more common than かっこよさすぎる. Compounds that end in word
X don't necessarily have the same distribution of derivations as word X
itself.

This is actually a very good example of why a "no inflection" tag
wouldn't be useful to tell very much about what kinds of restrictions
there are on the derivations the word can actually take in practice.
Exactly what it means for nontrivial derivations would always be
different from word to word.

A good dictionary application would end up with special cases for
everything marked with "no inflection" if it wants its handling of
inflections to be robust.

You could use such a tag to prevent an application from generating a
conjugation table, but replacing a special word class with a normal one
and adding "no inflection" wouldn't really be appropriate for
applications that don't deal with conjugations that way. It doesn't tell
a deconjugator what it needs to know to handle the word in a robust manner.