I thought about this some more.An [ofk] tag alone wouldn't entirely solve this problem though.Take 眼鏡:
Google N-grams:眼鏡 2228370めがね 1206845メガネ 3949144眼鏡 is quite common indeed, but still only occurs in 30% of the cases. So sure, you could expand [ofk] to mean 30-70% instead, or even 20-80% but... I feel it'd be more elegant to keep entries like this one as they are and not introduce more granular frequency tags but instead only mark the really rare kanji as [rK]. That should be enough to let students know they don't need to learn it, and for dictionary apps to know they shouldn't display it prominently.[uk] would then mean "usually kana, but not always, unless the kanji is marked [rare])"instead of the current "usually kana, and possibly essentially never kana"MarcusOn Sat, Mar 9, 2019 at 11:49 AM Marcus Richert <superbrightfuture@*********> wrote:An [ofk] tag would be useful too, but I kind of like the simplicity of just tagging things [uk] if that's how it appears 50%+ of the time.On Thu, Mar 7, 2019 at 10:21 PM Jim Breen jimbreen@********* [edict-jmdict] <edict-jmdict@***************> wrote:There are two issues here:- the possible tagging of rare kanji/surface forms;- extending the current on/off "uk" into something a bit more informative.At present there is the "iK" tag to indicate that one or more kanji in the termis plain wrong (often a look-alike), and "oK" for when it contains an old/obsolete kanji.I can't see a problem in adding something like an "rK" to indicate that the kanji formis quite rare. The "米突" for メートル is a good example.I agree that the one "uk" tag for the whole entry is a rather blunt instrument. It'sfine for 米突/メートル (0.1:99.9), but in cases like 喋る/しゃべる, it is not wrong, but itobscures the fact that these days 喋る is almost as common as しゃべる (4:5).I wouldn't want to go overboard with tags, but it could be significantly improved byadding an additional tag. e.g. "ofk" (often kana) with the tags used in the followingkanji/kana ratios:- up to 60:40 - no tag- between 60:40 and 40:60 - "ofk"- 40:60 to 0:100 - the current "uk"Just under 8k of the 182k entries in JMdict have "uk" tags, so it's not a big change.A bit of sampling indicates that maybe 10-15% of the current "uk" entries could go to&quo ;ofk", and possibly the same number of untagged entries would get it too.Early days yet, but they're my thoughts.JimOn Thu, 7 Mar 2019 at 12:55, Marcus Richert superbrightfuture@********* [edict-jmdict] <edict-jmdict@***************> wrote:
I noticed in the English Wiktionary's メートル entry that they have a [rare] tag (or a "hand-written note", really) for the kanji "米突". Might we want to implement something like that in jmdict as well? I feel like it's sometimes a bit of a problem esp. with the [uk] entries, where there's a large difference between what the takeaway should be for the [uk] in 喋る and the [uk] in 西班牙. The former is still very often written with kanji, while the latter is hardly ever used, or only used in historical texts. I think this presents a problem for learners as it's hard to determine whether or not you should bother learning a kanji associated with these word. It'd be easier if all [ateji]-tagged kanji were obscure, but they aren't, of course, and there's probably other cases where the only kanji reading is rare or obscure.(I lifted the examples from this discussion on reddit: https://www.reddit.com/r/LearnJapanese/comments/8mdih8/in_your_experience_is_jisho_generally_accurate/)The same tag could also be applied to readings, in theory, though I feel it's less of a problem with them as long as you assume the first reading is basically the one you should remember. (just like this is less of an issue for obscure kanji when there's a common kanji listed before it)Marcus
--Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/ http://nihongo.monash.edu/
- References:
- [rare] tag for obscure kanji?
- From: Marcus Richert <superbrightfuture@*********>
- Re: [edict-jmdict] [rare] tag for obscure kanji?
- From: Jim Breen <jimbreen@*********>
- Re: [edict-jmdict] [rare] tag for obscure kanji?
- From: Marcus Richert <superbrightfuture@*********>
- Re: [edict-jmdict] [rare] tag for obscure kanji?
- From: Marcus Richert <superbrightfuture@*********>
- Prev by Date: Re: [edict-jmdict] [rare] tag for obscure kanji?
- Next by Date: Re: [edict-jmdict] [rare] tag for obscure kanji?
- Previous by thread: Re: [edict-jmdict] [rare] tag for obscure kanji?
- Next by thread: Re: [edict-jmdict] [rare] tag for obscure kanji?
- Index(es):