[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] New top kanji forms for numbers



It seems like it’s one of those tricky cases where if you think about JMDict as a dictionary itself, it might not make sense for 100 to come first. But if you think of it as a data source to a bunch of different applications, not all of which are necessarily just dictionaries, it seems like the more common form (100) should come first to have consistency in the way the data is represented. It seems like putting 百 first might be fixing an application-layer problem with a data-layer solution, that might not be appropriate for all applications. In other words, maybe it should be left to applications to special-case these words if they feel that 百 should be displayed first? 

Of course, the counter-argument to that is that most applications probably won’t do any special-casing. And any time you’re expecting most users of the data to have to override specific entries, that has a bit of a smell to it.

Just throwing some thoughts out there. For what it’s worth, at the end of the day I’d probably lean a little bit towards putting 百 before 100.

Chris
2019年6月2日 2:41 -0400、Marcus Richert superbrightfuture@gmail.com [edict-jmdict] のメール:

I don't know that it's particularly important, but putting the most common surface form first is how we typically do things. The implication of putting 百 before 100 would be that it's more the most common surface form, which would be misleading. Context-dependence deciding which surface form should be used isn't somehow unique to the numerals, either, and 何百 is already an entry of its own (we have 216 entries containing the 百 kanji). Also, the single-digit numerals entries still lead with kanji, and things like ひゃくまん currently leads with 100万 before 百万 and 1000000.

On Sun, Jun 2, 2019, 01:39 Robin Scott robinandrewscott@********* [edict-jmdict] <edict-jmdict@***************> wrote:
 

I don't think it's particularly important to indicate that "100" is more common than 百. It's context dependent, anyway. For instance, you'd write 何百 but not 何100.

To me, it looks very odd having "100" before 百. I'm in favour of putting 百 first.

Robin

On Sat, Jun 1, 2019 at 10:59 AM Marcus Richert superbrightfuture@********* [edict-jmdict] <edict-jmdict@***************> wrote:
 

Chris, thanks for bringing this up. I'm responsible for most of those edits but I was a little sloppy with the prio-tagging. Normally we put [spec1] tags on kanji/surface forms that are more common than same-entry kanji that have other prio-tags (news1, etc.) but I didn't for most of these numerals. I will go through them and fix this so it's less ambiguous which surface form is the most common.

Anton, 百 and other kanji are still included and prio-tagged in each numeric entry (where it makes sense), only they now come after other more common forms. I wouldn't ever recommend using an older file of the dictionary files as they are continuously improved and updated on. Sure, we translate "100" as "100", but we're also specifying how it's pronounced and making it clear which way is the most common way to represent these numerals in Japanese. 

Best,
Marcus

On Sat, Jun 1, 2019 at 11:50 AM Anton Tagunov anton.tagunov@********* [edict-jmdict] <edict-jmdict@***************> wrote:
 

You = gods, me = worshiper :)

Still.. doesn't this make 100 _both_ the primary form and the main translation?

Effectively translating 100 to 100? :)

In the meantime I feel rather happy to be using an older version of the dictionary mapping 100 to 百. Of course I am aware they are rarely used, but they are glyphs I need to learn..

Thx,
learner

On Sat, 1 Jun 2019, 01:58 Jim Breen jimbreen@********* [edict-jmdict], <edict-jmdict@***************> wrote:
 

Sorry for the slow response. Marcus Richert has been trying to send to the group about this
but Yahoo has been rejecting his emails. I had the same issue with another list a few days
back.

The 全角 numerics appear to be the most common surface forms these days, at least in WWW pages
but probably elswhere too. We're tagging them "by hand", as they don't show up in the older
ranking metrics.

Jim


On Thu, 30 May 2019 at 06:24, Chris Vasselli clindsay@********* [edict-jmdict] <edict-jmdict@***************> wrote:


Hi everybody,

I noticed recently a bunch of entries for numbers have been getting updated with a new top kanji form using the full-width arabic numeral representation. For example, the top kanji form for  is now 100.

I’m not necessarily against this change, but I was curious to hear the reason for it.  I’m not completely sure if as a Japanese learner you looked up ひゃく or “one hundred” in a dictionary, you’d want to see 100 as the primary form, I’m guessing you’d want to see 百? Of course, if 100 is truly more common, then maybe that’s the appropriate form to show, I’m not sure. Just wanted to bring it up for discussion.

Also, in the above case the 百 form is still marked with the [ichi1,news1,nf01] tags, which I believe is supposed to indicate that that’s the most common form. But the 100 entry is the first one in the list. So it seems slightly ambiguous to me which is being indicated as the most common form.

Chris




--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/                                 http://nihongo.monash.edu/