[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Names: JMdict or ENAMDICT



Responding to Stuart's questions:

> Stuart McGraw tried to post the following, but it failed to get through,

> Is the culling going to be that names removed from jmdict will go into
> jmnedict (if not there already)?

Yes.

> Or will some jmdict names just be
> removed altogether (I mean valid names, not mistakes)?

No way.

> I think you're saying it should be the case that any name in jmdict
> is also in jmnedict (jmdict names are a proper subset of jmnedict),
> yes?

Yes, very much.

> Would it be useful to apply the jmnedict "misc" tags ("person", "place",
> etc), for names to the name entries in jmdict?  (In the database tags
> are common between both dictionaries so applying them to the jmdict
> entries could be automated I think -- but the jmdict dtd would need to
> be updated to account for them).

It might be useful, but it needs a bit of thought and planning. Some of the
things in JMdict, e.g. names of books of the Bible, are quite OK tagged as
nouns, but the JMnedict version needs a bit more as everything in
that collection is a noun.  The XML to EDICT conversions would need
to be updated for the extra entity codes.

Jim

> ------------------------------------------------------
> On 6 September 2013 16:30, Jim Breen <jimbreen@gmail.com> wrote:
>> Marcus has just suggested the deletion of a number of JMdict
>> entries relating to newspaper, etc. names, and raised the issue
>> that we don't have a clear policy on which names go in JMdict
>> and which belong in ENAMDICT/JMnedict.
>>
>> I'm quite aware that such a policy statement is overdue by many
>> years, and it's probably time to address it. I'd like to open it up for
>> discussion.
>>
>> [While in an ideal world it might be better to have everything back
>> together again, that would be a huge step and a lot of groundwork
>> would be needed first, as the compilation principles are a bit
>> different. I think it's better for now to keep them apart, to have most
>> named-entity entries in ENAMDICT, and to keep a defined subset in
>> JMdict as well, as there are apps that use that file alone, and users
>> can reasonably expect to find the sorts of names they'd find in GG5, etc.
>> (e.g. ニューヨーク,東京, etc.)]
>>
>> I propose that the following named-entities be included in JMdict (and
>> if they are not there already, it's OK to add them.
>>
>> -names of Japanese prefectures
>> -names of major Japanese cities (perhaps a population threshold)
>> -names of Japanese regions (近畿, 北陸, etc.)
>> -names of countries and their capital cities
>> -names of other significant cities
>>
>> and possibly:
>>
>> -names of states, provinces, etc. plus their capital cities
>> -names of very important individuals (criteria?)
>>
>> I'm inclined to stop there, but I realise that doesn't cover anything like
>> all the names currently in JMdict. So in addition I propose that:
>>
>> -names currently in JMdict will not be removed unless they are
>> obscure and/or inappropriate (e.g. ワシントンポスト can probably
>> stay, but I'm not sure about ハンデルスブラット)
>>
>> -sets of related names currently in JMdict can be completed if
>> they are incomplete (I'm thinking of the books of the Bible where
>> there are a lot, but I'm not sure all are there.)
>>
>> I realise this is ducking away from the edge case issue, but I
>> think we should be pragmatic. Bach, Mozart and Beethoven have
>> been included for decades, and I'd be uncomfortable chopping them
>> out (they are GG5 of course). It's odd we have Manchester and
>> Birmingham, but not Lyons and Marseilles.
>>
>> Possibly the grandfather clauses (above) remove the need to
>> mention states/provinces/capitals. We have Newfoundland,
>> Saskatchewan and Quebec, but not some other Canadian provinces.
>> I don't think we have any Brazilian or German states.
>>
>> Anyway let's discuss this. And for now I suggest leaving the
>> proposed deletions wait until a policy is settled.
>>
>> Cheers
>>
>> Jim
>>
>> --
>> Jim Breen
>> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
>
>
>
> --
> Jim Breen
> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University



-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University