[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Conjugations and PoS tags for だ, くれる



A few comments.

1) The entry for くれる is marked more clearly now for human usage than it would be if the note were removed and the PoS changed to an obscure PoS tag.  くれろ used to be common, and it still gets used, so I think having it in the conjugation table with a usage note on the entry (as at present) makes more sense.

2) いい could be handled simply by splitting it out from the entry for よい and making よい [adj-i] while making いい an [exp] or [unc] with an xref to よい and a note that いい doesn’t inflect.  I don’t think an additional PoS is needed and if one is added, it definitely shouldn’t include よい; よい is just a regular old i-adjective.  And いい is just a modified version of よい that has no inflections of its own, so I think it would be wrong to say that it has its own PoS with its own inflection pattern that includes よくない, etc.; those forms belong to the regular adjective よい.

3) A number of the entries we have marked as [aux] or [aux-v] have their own conjugation patterns (See for example here: https://ja.wikipedia.org/wiki/%E5%8A%A9%E5%8B%95%E8%A9%9E_%28%E5%9B%BD%E6%96%87%E6%B3%95%29).  In addition, which conjugation of だ is currently not covered by its own entry in EDICT? だろう・だろ, で, な, and なら are all in there already (though the entry for な needs work), and I’ve just submitted an entry for だった, which as far as I can tell is the only form that was missing.  (And of course である and です, etc. are in there too.)  Does it really need its own PoS when all of its conjugated forms have their own entry?  Have I forgotten some needed forms?  The [aux]/[aux-v] tags are there as a grab-bag for these kinds of items, so I think they are appropriate for だ.



Rene


On Jun 29, 2014, at 12:45 AM, Olivier Binda olivier.binda@wanadoo.fr [edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:

> 
> In my apps, I do some nlp related processing (glossing, projecting ipadic postag to jmdict postag, and lately linking (writtenForm|pos|synset) of JWordnet and (Keb|Reb|pos|Sense|Gloss) of Jmdict...)  
> 
> What Is usefull to me is when the metadata that comes from jmdict is usefull to a machine/software intelligence (and not only to a human intelligence)
> I.E metadata that doesn't put me in a situation where I have to do (very complex and only partially working) desambiguation
> 
> Adding pos tag for だ, くれる, いい/よい forces me update my code but  doesn't make things more complex for me and might ) reduce complexity, just slighlty because it's really easy to differentiate da/kureru/ii/yoi from other entries in their pos group... 
> So I vote blank...
> 
> 
> In my opinion, 
> 1) metadata/pos tags should represent/indicate an intrinsic characteristic of the language 
> 2) if you can imagine an interesting use for a pos, it should probably be in there. If you can't, maybee it shouldn't
> 
> 
> Olivier
> 
> 
> 
> 
> 
> On 06/29/2014 03:12 AM, Jim Breen jimbreen@gmail.com [edict-jmdict] wrote:
>>  
>> Pardon the top-posting. I'm trying to be brief. Re the points
>> Stuart has raised:
>> - I'm quite comfortable with the concept of the いい/よい adjectives
>> having a PoS tag of their own. It's something that needs to be highlighted
>> to students. I suggested "adj-iy" to Stuart as the "y" might aid
>> recognition. "adj-ii" might not trigger anything with romaji-thinking
>> people. Just "adj-y" is possible too.
>> - For くれる I'm a bit uncomfortable as AFAIK the only irregularity
>> is the くれ imperative, and you can find people actually using the
>> full くれろ, although it's not as common. (In WWWJDIC I peek at
>> the kanji (呉) and display "くれ[ろ]".) That said, I could live with "v1-i"
>> (or even "v1-s"). Neither is used at present.
>> - the だ case is, er, interesting. While I'm comfortable with the "aux",
>> which Rene described as `grab-bag for "things that conjugate, but
>> not according to anything like the regular Japanese verb conjugation
>> patterns"', I can't see a major problem with giving it a PoS. I don't
>> like having PoSs for single entries, but I guess if any is going to
>> be justified, it's "the copula". Any improvements on "cop"?
>> 
>> Apropos of だ, Makino & Tsutsui in the "Conjugations" appendix in
>> their DBJG have a table entry for だ, ではない, だった, etc. labelled
>> "Copula", so it's not as if we'd be going against the flow.
>> 
>> I'll be very interested to hear other people's views on this.
>> 
>> Cheers
>> 
>> Jim
>> 
>> On 29 June 2014 08:43, Stuart McGraw smcg4191@frii.com [edict-jmdict]
>> <edict-jmdict@yahoogroups.com> wrote:
>> > I would like to re-raise an issue that was previously discussed
>> > here around 2010-10-17, Subject: "A 'cop' PoS tag?"
>> > https://groups.yahoo.com/neo/groups/edict-jmdict/conversations/topics/4315
>> > I have been discussing the issue with Jim Breen in email and he
>> > suggested raising the issue here...
>> >
>> > tl;dr... I would like to request that だ and くれる get unique
>> > PoS tags that convey the fact they conjugate differently than
>> > other words that share their current PoS's.
>> >
>> > Back in 2010 I asked about a 'cop' PoS tag for だ because of
>> > its usefulness in conjugating words from JMdict. It has become
>> > of more than theoretical interest to me lately because, as you
>> > may have seen, the JMdictDB submission system entry pages now
>> > have a "Conjugations" link. (This is my own implementation; it
>> > was developed independently of Jim's conjugator code although I
>> > made considerable use of the conjugations WWWjdic provides in
>> > building the data tables for my version).
>> >
>> > It seems to work well [*1] and provides conjugations for a
>> > number of PoS classes that WWWjdic doesn't. However there
>> > are problems with the following classes of words:
>> >
>> > 良い・いい -- Jim has agreed to a special PoS for these
>> > and it and its conjugations are already in my local
>> > code base so this problem will go away soon.
>> >
>> > くれる -- I would (as in 2010) like the 'v1i-k' PoS tag that
>> > this word formerly had restored so that its irregular
>> > imperative form can be automatically generated.
>> >
>> > だ -- Right now this has an 'aux' tag which is not conjugatable
>> > at all. In much the same way as WWWjdic conjugates 'vs' words
>> > by conjugating an affixed する, I "conjugate" 'n' and 'adj-na'
>> > words by conjugating an affixed だ. I'd prefer to instead
>> > just conjugate the word だ directly. If だ had a PoS that
>> > identified it as being in a unique conjugation class, doing
>> > that would be much easier.
>> >
>> > Below are some points from the previous discussion and my recent
>> > discussion with Jim in a Q&A format...
>> >
>> > Why don't I just special-case だ and くれる?
>> >
>> > Because the conjugator is table-driven: all the information
>> > needed to conjugate a word is obtained (primarily) from a
>> > table that is indexed by PoS and conjugation type. The assumption
>> > is that the PoS in effect defines the rules for conjugating
>> > words of that class. This table-driven approach allowed
>> > me to implement the conjugations feature completely in SQL
>> > which has advantages for the JMdictDB project.
>> >
>> > There also seems to be a dearth of open-source code for doing
>> > Japanese conjugations. The conjugations tables in JMdictDB are
>> > open-source and need not be implemented in a database; they
>> > can easily be read by or embedded in code in any programming
>> > language to do conjugations. The actual code needed to generate
>> > a conjugated form from the info extracted from the tables is
>> > trivial [*2]. They are thus of wider benefit than to just the
>> > JMdictDB project. However, if every code that wanted to use
>> > them had to also write code to special case some words, their
>> > value would be substantially reduced.
>> >
>> > Aren't there a lot of words with unique conjugation rules
>> > which will lead to a lot of unique PoS tags?
>> >
>> > I don't know. In the 2010 discussion that was a point raised
>> > but most of the words mentioned I think were archaic. AFAIK
>> > the only common modern words in JMdict that violate the PoS-
>> > defines-conjugation-rules assumption are the three mentioned
>> > above.
>> >
>> > It would be nice to provide conjugations for archaic words as
>> > well and there may be ways to do so (maybe by structuring the
>> > PoS tags into a two-level hierarchy?) but I think being able to
>> > uniformly handle just modern words has enough value to justify
>> > being addressed independently.
>> >
>> > Shouldn't 'cop' (copula) also apply to other entries like である?
>> >
>> > Perhaps. The word "copula" has meaning that extends beyond
>> > the syntactical. Perhaps some abbreviation other than 'cop'
>> > would be better for what I am requesting. 'da'? 'cop-da'?
>> > 'da-predicate'? I care less about the actual abbreviation used
>> > than that there is one that says that this word だ conjugates
>> > in this particular way.
>> >
>> > Comments?
>> >
>> > ----
>> > [*1]
>> > Corrections will of course be gratefully received.
>> >
>> > [*2]
>> > I wrote a simple command line demo program that conjugates
>> > words and runs completely independently of the JMdictDB database.
>> > The conjugation is done in the 4-line function construct() which
>> > uses only info extracted from the conjugation tables (read
>> > from .csv files) to do its work. See
>> > http://www.edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/eb393788c541/python/conj.py#l237
>> > Although the rest of the code is a little lengthy that is most
>> > because of copious comments, argument parsing, output formatting
>> > and rearranging the tables for more efficient lookups.
>> >
>> >
>> > ------------------------------------
>> > Posted by: Stuart McGraw <smcg4191@frii.com>
>> > ------------------------------------
>> >
>> >
>> > ------------------------------------
>> >
>> > Yahoo Groups Links
>> >
>> >
>> >
>> 
>> -- 
>> Jim Breen
>> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
>> 
> 
> 
>