[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] keb vs reb (again)
On 14 August 2010 19:01, Jean-Luc Léger <reiga@dspnet.fr.eu.org> wrote:
> On Sat, 14 Aug 2010 00:13:58 -0600, Stuart McGraw <smcg4191@frii.com>
> wrote:
>> The problem was that entry 2529240 has a kanji of "・" (KATAKANA
>> MIDDLE DOT, U+30FB). When someone tried to edit it and create a
>> reading restriction to it, JMdictDB rejected it because it only
>> allows restrictions to kanji strings, and it considers "・" to
>> be a reading string. The reason it considers it a reading
>> string is because of this discussion in 2009:
>>
>> In edict list message of 2009-02-26 Jim Breen wrote,
>> (http://tech.groups.yahoo.com/group/edict-jmdict/message/3348)
>>>[...]
>>> The time has come to get a bit stricter on this,
>>> so I propose to state some rules for what can go
>>> in the "reading" fields. They are:
>>>
>>> - kana
>>> - the kana-related specials: ー ヽ ヾ ゝ ゞ
>>> - ・
>>> - 〜
>>>
>>> All the others, such as double-width alphanumerics are invalid
>>> in readings, but can appear in the "kanji" part. Also, anything
>>> that fits that set above has to go in the reading part.
>>>[...]
>>
>> So my question is, is entry 2529240 in error or have the rules
>> changed, and if the latter, what are the new rules?
>
> I don't think that rule mas meant to restrict anything from the kanji
> part.
Correct. I was trying to pin down what can be in a reading; not define
what is valid in a kanji part. And anything valid in a kanji part MUST be
valid in a "re_restr".
> In fact, from my point of view, non kana characters (ヽ ヾ ゝ ゞ ・ 〜) should
> appear in the reading field only if they already are in the kanji field
> (or if there is no kanji field, of course (or it is a re_nokanji reb ^^;))
Hmmmm. No. ヽ ヾ ゝ ゞ are kana repetition symbols (mercifully
little-used.) In theory there could be a kanjiless entry with the reading field
of "ふぶ;ふゞ. (and I have seen 花吹雪 have its reading described as はなふゞき.)
> So the rule for the kanji part should be :
> - any character can go there
Any double-width, yes. But sparingly, please.
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne