[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reading combinations (Was: Re: [edict-jmdict] ENAMDICT/JMNEdict)
2011/2/24 Francis Bond <bond@ieee.org>:
> 2011/2/24 Jim Breen <jimbreen@gmail.com>:
>> 2011/2/19 Francis Bond <bond@ieee.org>:
>>> .... I am
>>> still hoping we will split apart the extreme katakana variants.
>>> ID:2553880 クロトガリザメ 黒尖鮫 silky shark
>>> ID:2553880 シルキーシャーク 黒尖鮫 silky shark
>>
>> From time to time I regret having let all those go through. On reflection,
>> I think entries such as 2553880 should be split into
>> 黒尖鮫/くろとがりざめ/クロトガリザメ and
>> シルキーシャーク/シルキー・シャーク with appropriate xrefs.
I have thinking more about this in the last couple of days, as it
goes to the heart of what makes an entry. I think the 2-out-of-3
rule (which I sort-of did by the seat of the pants in the early days,
then actually wrote down when I was writing the COLING workshop
paper on JMdict in 2004) has served pretty well when there is kanji
in the entry.
Things are a bit muddier when there are no kanji, and two quite different
words mean the same thing. Consider the entry:
アメリカナヌカザメ;カリフォルニアスウェルシャーク /(n) swellshark
(Cephaloscyllium ventriosum, species of catshark in the Eastern Pacific)/
Both the Japanese words mean the same, but should they be in the same
entry? On reflection, my view is they should not. I think for kana-only entries,
multiple words in the kana field should really only occur when they
are variants of each other, e.g. ダイヤモンド and ダイアモンド. In cases
such as the swellshark above, where the words themselves are quite
different, they should be split and have cross-references.
The other issue is mixing up common names, such as the シルキーシャーク
above with formal names (黒尖鮫/くろとがりざめ/クロトガリザメ). I think
this is really the same. Words like シルキーシャーク should go off to
their own entries (with xrefs) and the kana/reading part of entries
such as 黒尖鮫 should be limited to the hiragana/katakana versions
of that word.
If there is general agreement with this, I'll embed it in the Editorial
Policy on the Wiki, and start splitting a few entries.
Cheers
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Vice-president: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne