[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] adj-i/adj-ix issues



Hi.

I’m fine with leaving it the way it is now (i.e., いい = adj-ix; よい = adj-i), but I sense that there’s a desire to simplify the situation because all the phrases that include いい and よい are something of a redundant mess.

What I don’t think makes sense is merging よい and いい into a redefined adj-ix.  Those two words don’t have the same PoS.  They can’t possibly have the same PoS: the former conjugates and the latter doesn’t.  (Or if they do have the same PoS, they’re both just adj-i... if we are allowing for those unusual conjugations of いい as in Daijirin.)  いい being an “irregular” adjective that has よくない as its negative form is something that gets taught in intro Japanese, but it’s an incorrect simplification.


Rene


On Mar 6, 2018, at 12:11 AM, s mcgraw smcg6347@*********** [edict-jmdict] <edict-jmdict@***************> wrote:

I concur with Chris here. I have no opinion about separate or merged いい/よい 
entries but I don't see the benefit of dropping the adj-ix tag. For those to 
whom it makes no difference, it is trivial to map adj-ix to adj-i prior to 
processing. However going the other way is more difficult. For example 
the jmdictdb conjugator is driven solely by the PoS tag with no need to look 
at the text of the word itself. Were that needed it would be a significant 
architectural change. 

Another point is that the adj-ix tag captures the information that has been 
proposed to be duplicated in far less machine-useable form as a note in the 
entry. That is, the note can easily be generated from the tag. Again, going 
the other way (looking for some particular text in the note in order to 
rephrase it for example), is more fragile. 

It seems to me that even if いい is sometimes conjugated the same way as よい, 
いい/よい (and friends) is a sufficiently unique case, and one that is described 
as such even to beginning students of Japanese, as to justify a separate tag. 

Given the number of PoS tags already in use it doesn't seem like a huge gain 
to eliminate this one. 

-- Stuart 

On 03/05/2018 08:16 PM, Chris Vasselli clindsay@********* [edict-jmdict] wrote: 
> Hi all, 
> 
> Just wanted to give my 2 cents as a developer of an app <http://nihongo-app.com> that relies on this data. For some background, my app has a dictionary component, but it also uses the part of speech data to find the words that are contained in a piece of Japanese text. I rely on the part of speech information to power a deconjugation system, that can take something like 頭がよくなさそう and link it to the dictionary entry for 頭が良い. 
> 
> I can definitely adapt to whatever decision you guys make here, but it’s certainly useful for me to have entries that end in 良い to have a separate part of speech, or at least some sort of special tagging. To me, it’s useful information that can be provided by the database, that I will otherwise need to infer on my own. More information is better than less information, and the [adj-ix] marking would help me distinguish the cases where I need to do special handling. 
> 
> For example, Marcus mentioned that “we could maybe consider instituting a rule that よい should always come before いい in the kanji/readings.” In that case, this would make the conjugation/deconjugation logic work properly. But, I would need to special case these entries when they show up in the dictionary component of my app to show the いい version as the primary reading of the word. I want to display the most common reading, which is the いいversion. 
> 
> Without any sort of tagging that this an いい adjective, I will probably have to just look for words that end in よい and also contain an alternate いい reading, and assume that those need to be flipped for display. This seems a little brittle, and I’d rather have the database provide that information. 
> 
> The alternative of listing the いい version as the first version has its own problems. For example, when generating conjugation tables, we’d want to use the よい version. Even if as Rene mentioned  いくない could in some circumstance be a valid conjugation, it’s certainly not the conjugation I would want to present to my users, at least as the default conjugation. So again, I need a way to distinguish entries that need this kind of special treatment. 
> 
> In short, if we merge よい/いい entries back together, app developers still need a way to know that these entries need special handling. [adj-ix] seems like a reasonable way to do this in my mind. 
> 
> Just my 2 cents. Thanks! 
> 
> Chris 
> 
> 
> On Mar 6, 2018, 11:40 AM +0900, Jim Breen jimbreen@********* [edict-jmdict] <edict-jmdict@***************>, wrote: 
>> 
>> Yes, I'd missed that comment in Daijirin too. I had been leaning towards 
>> merging them back together with "adj-ix", but now I think that they can be 
>> "adj-i" with a note on the いい entry. Any software that conjugates adjectives 
>> will need to be aware that 良い/いい needs a bit of special treatment. 
>> 
>> Thanks for the very useful discussion. 
>>