[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Policy on names




On Jul 11, 2010, at 3:20 PM, Jim Breen wrote:

The questions are:
(a) what names should go in JMdict/EDICT? I think it
should be restricted to:
(i) names of countries;
(ii) names of capital cities and a few other significant
cities
(iii) a few very famous historical people
(iv) major international organizations.

Greyer areas are things like the names of
states/provinces, etc.

(b) should names that are in JMdict also be in
ENAMDICT/JMNEdict? I think they should.

Does this make sense? Any alternative suggestions?

Should we be moving eventually to integ ate the
two files?

In my mind, the greatest dictionary in the Universe is the Oxford unabridged, and any effort which eventually, though not necessarily immediately leads to the Japanese equivalent, is worthy - even if it is long after we are worm meals.  With that in mind, I feel like we are a sort of coterie of naturalist, observing what exists in the natural world of a language in our time and recording / describing it.  For that reason I think that all the words discovered, of course, should be documented somewhere.  The distinction between EDICT ENAMDIC is artificial, and I would argue temporary, and ONLY useful now because of the current deficit of non-names.  Perhaps necessary for reasons pointed out about the affect that has on searches etc.

I personally believe therefore that all entries should be ONE great corpus, but there should be tag of some sort that allows names to be easily ignored during searches.

Subset dictionaries CAN ALWAYS be generated from the master dictionary.  Just like there are an almost infinite variety of smaller "Oxford" dictionaries created from the unabridged.  I believe that process should be on the developers shoulders.  I favor an EVENTUAL but not immediate merger.  A merger in my mind would be appropriate at a time when the balance of nouns for example begins to exceed personal and place names.  I do believe that could happen if the language were studied and catalogued in a more orderly and strategic fashion.  I often try, and fail, but nevertheless try to enter a genus a day be it fish or lizards.  There are 25,000 species of known, and therefore named fish.  I suspect half of that number may have Japanese names.  Spread that around botany which is WAY bigger, and zoology, and we're gonna rack up a lot of nouns.

That's my current thinking subject to change as time rolls forward.