[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] use of entities and tags in jmdict
[Stuart McGraw ([edict-jmdict] use of entities and tags in jmdict) writes:]
>> Some things I wondered about....
>> - "word usually written using kana alone" is used in both re_inf
>> (once) and misc (892 times). Does the lonely re_inf occurance
>> belong in misc?
Got it. #1270270. Moved to misc.
>> - "rare" occurs in re_inf (2) and misc (49). Ditto.
I have removed them both.
>> - "word containing irregular kana usage" occurs in both re_inf
>> (51 times) and in ke_inf (once). Similar question.
It's in the アウム真理教 variant of オウム真理教, and it's there
because it refers specifically to the アウム. I guess it can be in the
re_inf instead.
>> - The strings for a number of entities have trailing spaces.
>> Is there a reason for this?
Only that I must have keyed them. Occasionally I do a global removal
of stray spaces.
>> - The following entities are not used in jmdict I think (or maybe
>> they were overlooked by my counting program?) I have indicated
>> my guess about what tags they will occur in if/when they get
>> used. Are my guesses right? (I ask because I need to put then
>> in the right db table.)
>>
>> Godan verb (not completely classified) <pos>
>> adverbial noun <pos>
Surplus. I use "n-adv" instead of "adv-n". Removed.
>> feminine gender <g_gend attribute>*
Yes, these are attribute values. Removed from the entity list.
>> grammatical term <field???> but there already is a "linguistics" field?.
I have dropped it for now. I suspect it's not needed.
>> irregular verb <pos>
>> male slang <misc>
>> manga slang <misc>
>> masculine gender <g_gend attribute>*
>> negative (in a negative sentence, or with negative verb) <pos>
>> negative verb (when used with) <pos>
These are not needed. Removed.
>> neuter gender <g_gend attribute>*
>> quod vide (see another entry) <misc???>
Replaced by the xref. Removed.
Also:
>> Two more things to add to that list
>> - "gikun (meaning) reading" in re_inf (32 times) and misc (1 time)
Fixed.
>> - "idiomatic expression" in <pos> (2 times) and <misc> (5 times)
Moved both to <misc>
>> * -- g_gend attribute not used yet.
No, it was a bit cheaty putting it in. I wrote a paper a couple
of years ago on JMdict, and realised I needed proper genders on the
French, German, etc. examples. So I added the attribute, but the markup has
never happened. For the WaDokuJT stuff, I think it maybe extractable
from the German glosses.
>> ant: 6
>> audit: 8237
>> bibl: 0
Interesting. I must mull over it.
Thanks
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学