[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Extra fields



I've attached additional PoS below, which are needed if we want to be able to handle archaic Japanese.  Note that there are lots of ni-dan types that need to be added.  Also, there appears to be a [v5z] PoS in EDICT (or at least it's present on the 'advanced search' page of the database).  No words are currently tagged as [v5z], and as far as I know, it is not a valid PoS.  So I think it can be deleted.

Archaic Japanese is definitely not my specialty, so there may be some that I've missed (esp. if there are any irregular variants like 行く).  But these appear to be the majority of remaining PoS according to Koj, Daij, Wiki, etc.  These should be double-checked.  See for example this page (http://ja.wikipedia.org/wiki/%E4%B8%8B%E4%BA%8C%E6%AE%B5%E6%B4%BB%E7%94%A8) and the related links in its sidebar.

Most modern verb equivalents have an archaic verb equivalent; that is to say that pretty much everything that is [v5k] could also be marked as [v4k], and most [adj-na] entries could also be flagged as [adj-nari].  That seems redundant to me, so I recommend that the archaic verb forms only be used it's a strictly archaic word that never uses a modern conjugation.


VERBS
四段活用
v4k
v4g
v4s
v4t
v4n
v4h -- already in EDICT
v4b
v4m
v4r -- already in EDICT

上二段活用
v2k-k
v2g-k
v2t-k
v2d-k
v2h-k
v2b-k
v2m-k
v2y-k
v2r-k

下二段活用
v2a-s -- already in EDICT
v2k-s
v2g-s
v2s-s
v2z-s
v2t-s
v2d-s
v2n-s
v2h-s
v2b-s
v2m-s
v2y-s
v2r-s
v2u-s

ADJECTIVES
カリ活用
adj-ka, e.g. 多かり

ク活用
adj-ku, e.g. 高し

シク活 #x7528;
adj-shiku, e.g. 嬉し

ナリ活用
adj-nari -- Archaic/formal form of ~な.  Some archaic adjectives are adj-nari only. But on the rare occasion that I've come across one, I've heretofore submitted it as [adj-na][arch], which is not strictly correct.


Rene



On 2012-04-08, at 7:24 PM, Jim Breen wrote:

 

Greetings,

Having successfully added the "hob" (Hokkaido-ben) tag to the
dialect set, I have a set of other tags which could be added. I'd
like to do them in a batch, as quite a few tables and files have to be
edited. I want to run them past now in case any more can be flushed out.

On the list are:

<misc>

"joc" for jocular/humorous words/expressions ("hum" is taken.)

<fld>

"biol" and/or "taxon". I'm not really sure about the usefulness of these,
but it won't do any harm enabling them.

<POS>

v2h-s
v4h
v4r

These are all old verb types for which we have entries pending. I know
there are more possible, but my dim grasp of classical Japanese is not
enough to flesh out the list.

Anyway, comments and additions welcome. I'll try and do the updates
later in the week.

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne