[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] use of entities and tags in jmdict



[Stuart McGraw ([edict-jmdict] use of entities and tags in jmdict) writes:]
>> Some things I wondered about....
>> - "word usually written using kana alone" is used in both re_inf
>>   (once) and misc (892 times).  Does the lonely re_inf occurance
>>   belong in misc?

Got it. #1270270. Moved to misc.

>> - "rare" occurs in re_inf (2) and misc (49).  Ditto.

I have removed them both.

>> - "word containing irregular kana usage" occurs in both re_inf
>>   (51 times) and in ke_inf (once).  Similar question.

It's in the アウム真理教 variant of オウム真理教, and it's there
because it refers specifically to the アウム. I guess it can be in the
re_inf instead.

>> - The strings for a number of entities have trailing spaces.
>>   Is there a reason for this?

Only that I must have keyed them. Occasionally  I do a global removal
of stray spaces.

>> - The following entities are not used in jmdict I think (or maybe 
>>   they were overlooked by my counting program?)  I have indicated
>>   my guess about what tags they will occur in if/when they get 
>>   used.  Are my guesses right?  (I ask because I need to put then 
>>   in the right db table.)
>> 
>>     Godan verb (not completely classified)   <pos>
>>     adverbial noun    <pos>

Surplus. I use "n-adv" instead of "adv-n". Removed.

>>     feminine gender   <g_gend attribute>*

Yes, these are attribute values. Removed from the entity list.

>>     grammatical term  <field???> but there already is a "linguistics" field?.

I have dropped it for now. I suspect it's not needed.

>>     irregular verb   <pos>
>>     male slang   <misc>
>>     manga slang    <misc>
>>     masculine gender   <g_gend attribute>*

>>     negative (in a negative sentence, or with negative verb)    <pos>
>>     negative verb (when used with)    <pos>

These are not needed. Removed.

>>     neuter gender    <g_gend attribute>*
>>     quod vide (see another entry)    <misc???>

Replaced by the xref. Removed.

Also:

>> Two more things to add to that list
>> - "gikun (meaning) reading" in re_inf (32 times) and misc (1 time)

Fixed.

>> - "idiomatic expression" in <pos> (2 times) and <misc> (5 times)

Moved both to <misc>

>>   * -- g_gend attribute not used yet.

No, it was a bit cheaty putting it in. I wrote a paper a couple
of years ago on JMdict, and realised I needed proper genders on the
French, German, etc. examples. So I added the attribute, but the markup has
never happened. For the WaDokuJT stuff, I think it maybe  extractable
from the German glosses.

>> ant: 6
>> audit: 8237
>> bibl: 0

Interesting. I must mull over it.

Thanks

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学