[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Names of chemicals



Please forgive me for agreeing with Jim Rose on this one, but he is indeed correct when he says:
 
"The goal of this nomenclature is to know the physical structure of the molecule just by knowing the name... but again, we're talking about infinity.  It wouldn't make any sense at all to try and list "every chemical name", because such a thing doesn't exist.  Knowing the nomenclature system, in this case the Japanized version, is what would be useful."
 
Addition of basic chemicals and chemistry jargon (most being nicely mis-pronounced English via katakana) might be useful but, as Jim Rose stated so well: exhaustive is impossible. In organic chemistry, for example, the naming of complex structures adheres to a formula I still remember in English but have no clue about in Japanese. I suspect the Japanese follow the English way, as they do tend to follow.
 
In short, better to leave such "naming" to technical dictionaries, lest we bloat Edict.

 
On 2/5/07, Jim Breen <Jim.Breen@**********************> wrote:

On Tue, Feb 06, 2007 at 08:58:01AM +0900, Michael Engel wrote:
> To decide which chemical names are useful for EDICT, is probably more work.

Indeed.

> What I would like to see, is a discussion how specialized word lists should
> find their way into EDICT - we face the same problem with our HanDeDict in
> the not so far future.

The same goes for several subject or domain-specific lists. For example
I have the "compdic" file which has about 15,000 entries. Of these about
3,000 are already in JMdict/EDICT, although sometimes they have a non-computer
gloss. One task I was working on a year or so ago was to get the compdic
file so that it could be merged into JMdict/EDICT with the ex-compdic
entries flagged, e.g. as <field>&comp;</field>. I had in mind the potential
to be able to regenerate a compdic-like subset as required.

The process is not actually that simple. For example 翻字 appears
identically in both files. It certainly is appropriate to a computer
glossary, but is also more general. In the end, I decided that really
everything in the compdic could well be regarded as of general use,
provided it is marked as having an application to computing.

I think Francis Bond agreed that the entries from his "lingdic" could
be rolled in too.

OTOH, while it makes sense to manage the huge enamdict file using the
same or similar software and databases, there is a good case for
being able to split them pretty cleanly apart. I can envisage
a marking system where some name entries are name-only (never get carried
into a derived EDICT form) and others are name-general (get carried
into both the EDICT form and the ENAMDICT form).

Anyway, I too would like some discussion on this.

Cheers

Jim