[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Extra fields



Thanks for the list and analysis.

On 10 April 2012 09:59, René Malenfant <rene_malenfant@hotmail.com> wrote:
> I've attached additional PoS below, which are needed if we want to be able to handle archaic Japanese.  Note that there are lots of ni-dan types that need to be added.  Also, there appears to be a [v5z] PoS in EDICT (or at least it's present on the 'advanced search' page of the database).  No words are currently tagged as [v5z], and as far as I know, it is not a valid PoS.  So I think it can be deleted.

We had a discussion about v5z just on 4 years ago  8-)}. I revealed
the customary vagueness
about its history and purpose. I have now deleted it from the database
table (it no longer shows
in the  'advanced search'  page.

> Archaic Japanese is definitely not my specialty, so there may be some that I've missed (esp. if there are any irregular variants like 行く).  But these appear to be the majority of remaining PoS according to Koj, Daij, Wiki, etc.  These should be double-checked.  See for example this page (http://ja.wikipedia.org/wiki/%E4%B8%8B%E4%BA%8C%E6%AE%B5%E6%B4%BB%E7%94%A8) and the related links in its sidebar.

I looked at that page and recoiled a bit in horror. I think I'll
charge ahead with your list and fix
any bugs if/when they emerge. (Back when I was an MBA student 30+
years ago this was
called the "chop-and-mop" approach to organizational change.)

> Most modern verb equivalents have an archaic verb equivalent; that is to say that pretty much everything that is [v5k] could also be marked as [v4k], and most [adj-na] entries could also be flagged as [adj-nari].  That seems redundant to me, so I recommend that the archaic verb forms only be used it's a strictly archaic word that never uses a modern conjugation.

Very sound advice. I have added a bullet-point to
http://www.edrdg.org/wiki/index.php/Editorial_policy#Part-Of-Speech_.28POS.29_Issues
to reflect this.

> VERBS

Looks good. For the entity-labels/documentation I'll use:

v4k - "Yodan verb with `ku' ending (archaic)"
and
v2k-k - "Nidan verb (upper class) with `ku' ending (archaic)"
v2k-s - "Nidan verb (lower class) with `ku' ending (archaic)"

Will that work? I could just have "(upper)" instead of "(upper class)".

> ADJECTIVES
> カリ活用
> adj-ka, e.g. 多かり
>
> ク活用
> adj-ku, e.g. 高し
>
> シク活用
> adj-shiku, e.g. 嬉し
>
> ナリ活用
> adj-nari -- Archaic/formal form of ~な.  Some archaic adjectives are adj-nari only. But on the rare occasion that I've come across one, I've heretofore submitted it as [adj-na][arch], which is not strictly correct.

I suggest:

adj-ka - "'kari' adjective (archaic)"
adj-ku - "'ku' adjective (archaic)"
adj-shiku - "'shiku' adjective (archaic)"
adj-nari - "archaic/formal form of na-adjective"

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne