[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] [rare] tag for obscure kanji?



There are two issues here:
- the possible tagging of rare kanji/surface forms;
- extending the current on/off "uk" into something a bit more informative.

At present there is the "iK" tag to indicate that one or more kanji in the term
is plain wrong (often a look-alike), and "oK" for when it contains an old/obsolete kanji.
I can't see a problem in adding something like an "rK" to indicate that the kanji form
is quite rare. The "米突" for メートル is a good example.

I agree that the one "uk" tag for the whole entry is a rather blunt instrument. It's
fine for 米突/メートル (0.1:99.9), but in cases like 喋る/しゃべる, it is not wrong, but it
obscures the fact that these days 喋る is almost as common as しゃべる (4:5).
I wouldn't want to go overboard with tags, but it could be significantly improved by
adding an additional tag. e.g. "ofk" (often kana) with the tags used in the following
kanji/kana ratios:
- up to 60:40 - no tag
- between 60:40 and 40:60 - "ofk"
- 40:60 to 0:100 - the current "uk"

Just under 8k of the 182k entries in JMdict have "uk" tags, so it's not a big change.
A bit of sampling indicates that maybe 10-15% of the current "uk" entries could go to
"ofk", and possibly the same number of untagged entries would get it too.

Early days yet, but they're my thoughts.

Jim


On Thu, 7 Mar 2019 at 12:55, Marcus Richert superbrightfuture@********* [edict-jmdict] <edict-jmdict@***************> wrote:


I noticed in the English Wiktionary's メートル entry that they have a [rare] tag (or a "hand-written note", really) for the kanji "米突". Might we want to implement something like that in jmdict as well? I feel like it's sometimes a bit of a problem esp. with the [uk] entries, where there's a large difference between what the takeaway should be for the [uk] in 喋る and the [uk] in 西班牙. The former is still very often written with kanji, while the latter is hardly ever used, or only used in historical texts. I think this presents a problem for learners as it's hard to determine whether or not you should bother learning a kanji associated with these word. It'd be easier if all [ateji]-tagged kanji were obscure, but they aren't, of course, and there's probably other cases where the only kanji reading is rare or obscure. 

(I lifted the examples from this discussion on reddit: https://www.reddit.com/r/LearnJapanese/comments/8mdih8/in_your_experience_is_jisho_generally_accurate/)

The same tag could also be applied to readings, in theory, though I feel it's less of a problem with them as long as you assume the first reading is basically the one you should remember. (just like this is less of an issue for obscure kanji when there's a common kanji listed before it)

Marcus




--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/                                 http://nihongo.monash.edu/