Marcus has just suggested the deletion of a number of JMdict
entries relating to newspaper, etc. names, and raised the issue
that we don't have a clear policy on which names go in JMdict
and which belong in ENAMDICT/JMnedict.
I'm quite aware that such a policy statement is overdue by many
years, and it's probably time to address it. I'd like to open it up for
discussion.
[While in an ideal world it might be better to have everything back
together again, that would be a huge step and a lot of groundwork
would be needed first, as the compilation principles are a bit
different. I think it's better for now to keep them apart, to have most
named-entity entries in ENAMDICT, and to keep a defined subset in
JMdict as well, as there are apps that use that file alone, and users
can reasonably expect to find the sorts of names they'd find in GG5, etc.
(e.g. ニューヨーク,東京, etc.)]
I propose that the following named-entities be included in JMdict (and
if they are not there already, it's OK to add them.
-names of Japanese prefectures
-names of major Japanese cities (perhaps a population threshold)
-names of Japanese regions (近畿, 北陸, etc.)
-names of countries and their capital cities
-names of other significant cities
and possibly:
-names of states, provinces, etc. plus their capital cities
-names of very important individuals (criteria?)
I'm inclined to stop there, but I realise that doesn't cover anything like
all the names currently in JMdict. So in addition I propose that:
-names currently in JMdict will not be removed unless they are
obscure and/or inappropriate (e.g. ワシントンポスト can probably
stay, but I'm not sure about ハンデルスブラット)
-sets of related names currently in JMdict can be completed if
they are incomplete (I'm thinking of the books of the Bible where
there are a lot, but I'm not sure all are there.)
I realise this is ducking away from the edge case issue, but I
think we should be pragmatic. Bach, Mozart and Beethoven have
been included for decades, and I'd be uncomfortable chopping them
out (they are GG5 of course). It's odd we have Manchester and
Birmingham, but not Lyons and Marseilles.
Possibly the grandfather clauses (above) remove the need to
mention states/provinces/capitals. We have Newfoundland,
Saskatchewan and Quebec, but not some other Canadian provinces.
I don't think we have any Brazilian or German states.
Anyway let's discuss this. And for now I suggest leaving the
proposed deletions wait until a policy is settled.
Cheers
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University