[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Higher-level taxa in JMdict



G'day,

I don't see that there is any harm in having the taxonomic names in --- disk space is cheap, and the names are very specific, so don't get in the way when you are searching. 

On 19 July 2010 06:53, René Malenfant <rene_malenfant@***********> wrote:
 

As we were recently discussing which people and place names 'deserve' to end up in JMdict, I think we need some kind of policy for deciding whether to include taxa above the species level.


As far as I'm concerned, all species with a common name belong in JMdict itself, but most higher-level taxa (e.g., genus, family) usually ~do not~.  I see including all the ~科 and ~属 forms as no different than including all the possible ~的 entries (something we currently do not allow).  And while you might say that it would be useful to include the ~科 and ~属 entries for an E-J search, I have five general points against their inclusion.

  1. It would arguably be useful to also include the countless ~的 entries so that people would be able to search for "-ic" and "-ive" forms of many English words, etc., but we've still decided as a general rule to exclude all ~的 entries that cannot be found in a kokugo dictionary.

I would also include the  ~的 entries if it were up to me :-).
 
  1. Kokugo dictionaries omit the vast majority of genus and family names for a very simple reason: their construction is almost always derived from the name of a member.  For instance, the family "Myxinidae" is called "メクラウナギ科", which simply means "hagfish family".  You don't find "メクラウナギ科" in a kokugo dictionary for precisely the same reason that you wouldn't find "hagfish family" in any English dictionary: it's derivative and obvious.

Mid-level dictionaries don't include them for reasons of space (which are not an issue for us).  Large dictionaries do include them as they are not predictable --- most genuses (genia?) and families include many common fish, and it is not always obvious which one will be chosen to name the larger group. 
 
  1. "Myxinidae" is not English; it's Latin.  English dictionaries generally do not include Latin names (with exceptions for extremely important cases like "Anopheles" and "Culex", or where no other word exists for its member species, as in "Stegosaurus").  Last I checked, there wasn't a Latin project for JMdict, so if you're searching for "Myxinidae", you're looking up the wrong language.  Such entries belong in JMnedict or---even better---a specialized J-E taxonomy dictionary, perhaps one that could be run as a partner project to JMdict.  (And it might not be hard to start such a thing up automatically using Wikipedia and the UJSSB database here: http://research2.kahaku.go.jp/ujssb/)

I would consider Myxinidae to be English --- it is the name of a family of fish.   It is no less English than "television", just less common. 
 
  1. Not a single instance of actual usage can be attributed to many of these taxa names.  They simply aren't used outside of one place: dedicated taxonomic dictionaries (something that edict is not).  For instance, インドシュモクザメ属  (submitted last month by Jim Rose) gets only 38 hits, and as far as I can tell, they're all just species lists, etc.  As expected, most of these hits already include the Latin name Eusphyrna, because the Latin names are the accepted standards used worldwide.  Even in Japan.

My son's Japanese animal books had the Japanese name exclusively.  Larger books we borrowed from the library had both. 

What search engine are you using?  I get 8,790 hits for インドシュモクザメ属 on google and some of them are using it in the text.  E.g., http://ecolumn.net/shark_attack.htm.   It is definitely being used.  Note that this does not include the taxonomic name Eusphyrna.


2001年8月5日、鳥取市
白兎海水浴場にシュモクザメなど20匹ものサメが沖合に現れ、海水浴客1000人が避難。
5m級のサメも確認されており、ホホジロザメだとみられている。

参考「シュモクザメ」
分類:メジロザメ目>シュモクザメ科>インドシュモクザメ属、シュモクザメ属 
シュモクザメ科に分類される2属9種の総称であり、左の画像のように集団化する事が知られている。
 





 
  1. We attempt to discern which person, place and organization names are important enough to include in JMdict and which belong in JMnedict instead. I don't see any reason to bend those guidelines and allow the names of all ranked taxa, especially considering that (unlike personal names) taxa names sometimes become obsolete and therefore require maintenance.
I have no strong opinion as to whether the name should be in JMdict or JMnedict (let Jim decide).  I would like to see them both merged eventually.  I think it does make sense to tag them (with eg. {taxo}, or even better with <xlink type="taxo">Eusphyrna</xref> with a link to some canonical name in an external reference (wikispecies?  I have no idea).

 
So I recommend that all ~属, ~科, etc. entries that cannot be found in any of the major kokugo dictionaries (or whose non-Latin foreign-language equivalents cannot be found in a non-specialists' dictionary) be excluded from JMdict.

I didn't find any of these arguments convincing.  I say let's keep adding taxonomic names in (but tag them).   If they really bother you you can then strip them out :-).

>And as a fine example of my last point, Wikipedia has just reminded me that メクラウナギ科 is being phased out and replaced with ヌタウナギ科 because めくら is politically incorrect.  There are much more strict rules about when Latin names can be changed, which is why they're preferred for taxonomy.

People often read books that are out of date --- I would like to see Edict keep obsolete translations (again preferably tagged as such).  If I am trying to read an article on hagfish from the last millennium, I would like to be able to translate the name used then.


--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University