[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Names: JMdict or ENAMDICT



I've attached some thoughts below the fold.


Cheers,

Rene


On 2013-09-06, at 12:30 AM, Jim Breen <jimbreen@*********> wrote:

 

-names of Japanese prefectures

Agreed.

-names of major Japanese cities (perhaps a population threshold)

I recommend setting the criterion either as being 'designated city':
https://en.wikipedia.org/wiki/City_designated_by_government_ordinance#List_of_designated_cities
Perhaps also including 'core cities' if you are feeling generous:
https://en.wikipedia.org/wiki/Core_cities_of_Japan#List_of_core_cities

Together with Tokyo, those should cover all the major Japanese city names anyone is likely to come across.

-names of Japanese regions (近畿, 北陸, etc.)
-names of countries and their capital cities
-names of other significant cities

Yes.

and possibly:

-names of states, provinces, etc. plus their capital cities

I don't think it's necessary.  The list is too expansive, unless we specifically limit it to states/provinces of English-speaking countries (it is a J-E dictionary after all).

-names of very important individuals (criteria?)

Deities and other major religious figures or characters (e.g., Mohammad, Gautama, Virgin Mary), perhaps not from all religions but definitely from the Abrahamic faiths and Japanese religions.  A select number of extremely important historical figures known by pretty much everyone worldwide (Gandhi, Einstein, Darwin, Confucius, Hitler, Shakespeare, Beethoven, etc.) by judgment.  Perhaps the names of the current Japanese emperor, Japanese prime minister, U.S. president and other leaders of English-speaking (and/or G8) nations.

-names currently in JMdict will not be removed unless they are
obscure and/or inappropriate (e.g. ワシントンポスト can probably
stay, but I'm not sure about ハンデルスブラット)

I'd say that consistency of inclusion criteria should trump the grandfather clause.

-sets of related names currently in JMdict can be completed if
they are incomplete (I'm thinking of the books of the Bible where
there are a lot, but I'm not sure all are there.)

Speaking of which, the criteria above omit other non-person, non-place proper nouns (e.g., company names, book names, trademarked).  Thus far, the names of most religious texts have found their way into EDICT itself.  I guess most others should be banished to ENAMDICT unless they're things that have found their way into English like 'Pokemon'.