[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] PoS vs-i issues
On 17 July 2010 02:53, Jean-Luc Léger <reiga@dspnet.fr.eu.org> wrote:
> On Fri, 16 Jul 2010 15:32:39 +1000, Jim Breen <jimbreen@gmail.com> wrote:
> > (a) leave the 愛するbrigade as "vs-s";
> > (b) leave the 勉強 class as "vs" (these are the ones that Daijirin tags
> > "(名)スル" and the Japanese NLP people call サ変名詞/verbal noun).
> > There are thousands of entries with this tag;
> > (c) leave "vs-i" on 為る/する, mainly to keep WWWJDIC happy. It's
> > not at all irregular, so I may rebadge it "vs-x".
> > (d) do something different for the expressions with する (XXXにする,
> > XXXがする, XXXをする, etc.) There are about 200 of these, of which 74
> > are currently tagged "vs-i". Again I don't like "vs-i" on them as they
> are
> > not irregular. Maybe "vs-e" for "expression using する"?
[...]
> For exemple, take those entries :
>
> 気にする [きにする] /(exp,vs-i)
> 気になる [きになる] /(exp,vi,v5r)
>
> Both should be managed the same way. So having a 'vs-e' tag seems
> illogical.
> My opinion is that 'exp' + a tag describing the conjugation of the final
> word (here a verb), is quite enough.
> In summary, I would use the same tag in c) and d) (call it vs-x if you
> don't like vs-i)
This is a very good point. Thank you Jean-Luc.
Maybe "vs-i" is the way to go for both 為る/する and expressions such as
気にする. If Makino and Tsutsui can call する an irregular verb (which
it is, as it doesn't fit the modern 一段/五段 patterns,) then maybe it's
quite appropriate here.
So we leave (a), (b) and (c) above as they are, and go with "vs-i"
for 為る/する and all expressions that use it?
> Tell me if I misunderstood the problem and the need to separate c) and d).
I don't think you did.
> By the way, I see many verbal expressions with only (exp). Do we agree
> that they should also have a vxxx tag ?
> If so, I will build a script to extract every one that need a tag and
> which tag it need (from the one put on the entry of the verb alone).
> That could be a good occasion to think about a bulk updater for JMDictDB,
> if you don't already have one :D
If anyone can do that sort of extraction, it's JL, who has been a quiet
and significant contributor in the background, and has scripted some
significant checks and cleanups of errant entries.
Something like:
nnnnnnn v5k
mmmmm v1
...
would be a great help. We are not quite into bulk updating scripts
yet, but they are certainly feasible.
Thanks
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne