[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Parts of speech patterns don't seem to make sense [1 Attachment]
Thanks to Ben for this comment and analysis. I'll admit
PoS allocations have not been well defined or rigorously
policed. It would be a nice project for someone to try and
clean them up, especially the odd-ball ones.
2010/7/27 Ben Bullock <benkasminbullock@gmail.com>:
> I'm trying to normalize the database for my dictionary lookup. I found
> that there are 52 parts of speech which are combined in 359 different
> ways in JMdict.
>
> Now, I don't really get it. For example,
>
> <sense>
> <stagk>得る</stagk>
> <pos>&suf;</pos> ############# &suf;
> <pos>&v1;</pos>
> <pos>&vt;</pos>
>
> but
>
> <sense>
> <pos>&aux-v;</pos>
> <s_inf>usu. 直す</s_inf>
> <gloss>to do over again (after -masu base of verb)</gloss>
> </sense>
>
> Now it looks like words which have the same grammatical function might
> be listed as either &suf; or &aux-v;.
I agree. Many/most of the "suf,v...." probably should be retagged as
aux-v. There are 25 of them.
> Also there are things like
>
> <keb>得ない</keb>
> </k_ele>
> <r_ele>
> <reb>えない</reb>
> </r_ele>
> ....
> <pos>&suf;</pos>
> <pos>&adj-i;</pos>
> <xref>得る・える</xref>
> <gloss>(after the -masu stem of a verb) unable to...</gloss>
> <gloss>cannot ...</gloss>
> </sense>
>
> so now this (which was a verb) is an adjective and a suffix?
I can see why it is so. 得ない does go on the end of things as a suffix
and it behave like a 形容詞.
> This seems very arbitrary to me.
Some undoubtedly is
> I am attaching an analysis of the parts of speech for each sense in
> the JMdict_e file of July 20th. The key is
Thanks. I have saved a copy. I'll put tidying this up on the Wiki to-do
list when it is back.
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne