There are two issues here:
- the possible tagging of rare kanji/surface forms;
- extending the current on/off "uk" into something a bit more informative.
At present there is the "iK" tag to indicate that one or more kanji in the term
is plain wrong (often a look-alike), and "oK" for when it contains an old/obsolete kanji.
I can't see a problem in adding something like an "rK" to indicate that the kanji form
is quite rare. The "米突" for メートル is a good example.
I agree that the one "uk" tag for the whole entry is a rather blunt instrument. It's
fine for 米突/メートル (0.1:99.9), but in cases like 喋る/しゃべる, it is not wrong, but it
obscures the fact that these days 喋る is almost as common as しゃべる (4:5).
I wouldn't want to go overboard with tags, but it could be significantly improved by
adding an additional tag. e.g. "ofk" (often kana) with the tags used in the following
kanji/kana ratios:
- up to 60:40 - no tag
- between 60:40 and 40:60 - "ofk"
- 40:60 to 0:100 - the current "uk"
Just under 8k of the 182k entries in JMdict have "uk" tags, so it's not a big change.
A bit of sampling indicates that maybe 10-15% of the current "uk" entries could go to
"ofk", and possibly the same number of untagged entries would get it too.
Early days yet, but they're my thoughts.
Jim