[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] New top kanji forms for numbers



On Sat, 1 Jun 2019 at 19:49, Anton Tagunov anton.tagunov@gmail.com
[edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:
> Still.. doesn't this make 100 _both_ the primary form and the main translation?

Not quite. The 100 in the kanji/surface-form part of the dictionary entry is
made up of "double-width" (全角) characters, of the sort one often finds
in Japanese
text. The 100 in the meanings part of the entry are plain old
ASCII/ISO-8859-1/etc.
numerics. Obviously they mean the same thing (as does 百), but data-wise they
are different. As I see it there are advantages in having the 全角
numerics included
in the surface-form collection. It certainly helps automatic text glossing, etc.

> In the meantime I feel rather happy to be using an older version of the dictionary mapping 100 to 百. Of course I am
> aware they are rarely used, but they are glyphs I need to learn..

Can't you just ignore them? I can't see why you'd need to map them -
the dictionary entry
is effectively saying that semantically they're the same thing.

Jim


> On Sat, 1 Jun 2019, 01:58 Jim Breen jimbreen@gmail.com [edict-jmdict], <edict-jmdict@yahoogroups.com> wrote:
>>
>>
>>
>> Sorry for the slow response. Marcus Richert has been trying to send to the group about this
>> but Yahoo has been rejecting his emails. I had the same issue with another list a few days
>> back.
>>
>> The 全角 numerics appear to be the most common surface forms these days, at least in WWW pages
>> but probably elswhere too. We're tagging them "by hand", as they don't show up in the older
>> ranking metrics.
>>
>> Jim
>>
>>
>> On Thu, 30 May 2019 at 06:24, Chris Vasselli clindsay@gmail.com [edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:
>>>
>>>
>>>
>>> Hi everybody,
>>>
>>> I noticed recently a bunch of entries for numbers have been getting updated with a new top kanji form using the full-width arabic numeral representation. For example, the top kanji form for 百 is now 100.
>>>
>>> I’m not necessarily against this change, but I was curious to hear the reason for it.  I’m not completely sure if as a Japanese learner you looked up ひゃく or “one hundred” in a dictionary, you’d want to see 100 as the primary form, I’m guessing you’d want to see 百? Of course, if 100 is truly more common, then maybe that’s the appropriate form to show, I’m not sure. Just wanted to bring it up for discussion.
>>>
>>> Also, in the above case the 百 form is still marked with the [ichi1,news1,nf01] tags, which I believe is supposed to indicate that that’s the most common form. But the 100 entry is the first one in the list. So it seems slightly ambiguous to me which is being indicated as the most common form.
>>>
>>> Chris
>>>
>>>
>>
>>
>> --
>> Jim Breen
>> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
>> http://www.jimbreen.org/                                 http://nihongo.monash.edu/
>>
>>
>
>
> 



-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/