[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo readings)

To: edict-jmdict@***************
Subject: Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo *readings*)
From: Jim Breen <jimbreen@*********>
Date: Fri, 4 Nov 2011 12:41:32 +1100

I'm going to top-post (sorry) and try and keep it short.

In general I like the idea of having extra information
about the status of readings in words available. It comes
down to what information is available, how and where to
record it, and how to display it in a dictionary client without
creating a huge visual clutter. The colours used for the kanji
in WWWJDIC are an example of something that was easy to do
as the information is readily available, and there was no clutter.

With readings things get messier. For a start there is no real canon
of what is and is not an approved/recognized/etc. reading.
Virtually every 漢和字典 has a different take on them. For 常用漢字
there is a list of "standard" readings (see
http://www.csse.monash.edu.au/~jwb/jouyoureadings.html) but
these really just mean that in textbooks where a word uses a
reading not on the list, it should be written out in kana. AFAIK
there is no "grading" of these readings, apart from the grades
associated with first 1,000 or so 教育 kanji.

Then there's the question of how to display this information in a
meaningful way. Consider 飲む, which Nils mentions below. The
WWWJDIC display starts: 飲む(P); 呑む; 飮む(oK); 服む(iK) 【のむ】
with 呑 and 飮 in green showing they are non-常用.
What can be said meaningfully about the の.む reading? It's
a recognized 訓読み  of 飲, 呑 and 飮, and for 飲 it's even on
the standard reading list. For 服 it's not recognized at all.
How can that be shown without turning a reasonably succinct
entry display into a mess?

There is a way of getting this information, and it's a click away.
Clicking on "[Examine] the kanji ...." takes you to the kanji
details, which really has most of the extra information.
The only thing lacking is the status of the のむ reading of 飲.
One thing I'd like to do eventually is get that into KANJIDIC in
some way. I want to hold off until maintenance of KANJIDIC
gets into an online database. The present kanji database
system is reasonably complex, and further categorization  of
readings is something I simply cannot address in its current
form.

What I could do without too much hooplah is to draw on the
list of "standard" readings of 常用漢字 kanji and coax WWWJDIC
into highlighting them in the kanji display (e.g. putting them in red.)

Getting onto "popularity" briefly, note that the "P" in EDICT2 is not
really a popularity flag on surface forms; it's an attempt to flag to 20k
or so most common entries, and where there is a choice of surface form
or reading, show which one is the common one. It is derived from slightly
more fine-grained data in JMdict. For the general "for the masses" interfaces
I wouldn't suggest going beyond it. Finer-grained details should be available,
but at the price of a click or two to look at the underlying data (at
present you
can go off the database by clicking on "Edit", but perhaps there should be
a full view, taking you to something like:
http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&e=1076749
for the hard-core people.)

Well, I failed to be brief.

Jim

On 3 November 2011 21:53, Nils Roland Barth <jmdict.nbarth@xoxy.net> wrote:
> Hi Stuart, (and all)
>
> Concrete question/proposal:
> * Could we mark non-Joyo *readings*?
> (Currently non-Joyo *characters* are marked in purple 人名用
>  or green 表外字 – this is v. useful.)
>
> As René notes, this is done via a triangle in some dictionaries,
> and presumably this could be determined automatically,
> assuming we have a list of Joyo readings.
>
> This would be useful in (automatically) flagging potentially
> confusing readings.
>
> As a first step, we’d need to assign grades to *readings*
> in the kanji dic (currently it has grades for *characters*,
> and sorts readings by on/kun/name, but doesn’t grade readings).
>
>
> Together with figuring out what reading is implied by a spelling,
> this should allow us to automatically mark:
> * non-Joyo readings (needs grading)
> * non-standard readings (specific category of [iK]) (doable already)
> …and also presumably:
> * non-standard okurigana usage?
> (Presumably able to be done automatically.)
>
> I’m not proposing this be done now – this is a lot of work – but it
> would be an interesting project longer-term.
>
>
> <snip: Stuart explains (P)>
>
> Thanks for clearing that up – so:
> * (P) is “Popularity” (at the level of a giving spelling),
>  determined by given sources
>  (hence not editable)
>  It’s about absolute popularity of words (concretely,
>  of given strings of characters), not relative popularities
>  of spellings.
>
> * [iK] is (as René and Jim wrote) for *errors* or irregular
>  uses, determined manually by referring to dictionaries
> (This can be semi-automated by using list of readings
>  in the kanji dict to find pronunciations that don’t work,
>  but semantic errors/mismatches require human judgment.)
>
> As a concrete example of an “expressive” [iK],
> writing のむ（飲む／呑む） as 服む (for taking medicine)
> (which I added and marked as [iK]) seems to fit:
> Both JMdict and 広辞苑 list only the 音読み 「フク」
> and no 訓読み so (by this criterion) it’s [iK]
> b/c it’s not a valid reading.
>
>
> More detailed thoughts on popularity:
>
> Spellings should be ordered in list of popularity, but
> beyond that there’s no indication of *how* much more
> popular a spelling is.
>
>
> For example, crudely Googling for lemma forms of おもう　思う yields:
> * 思う 1,320,000,000
> * 想う    27,000,000
> * 念う        58,700
>
> I.e., there’s about 2 or 3 orders of magnitude (1.7 & 2.7)
> difference in frequency of these spellings, which I’d
> summarize as:
> * 思う is the standard spelling
> * 想う is a reasonably common variant
>  (e.g., typical native speakers would recognize and may use it)
> * 念う is pretty uncommon, but accepted
>  (e.g., some native speakers probably wouldn’t recognize it,
>   and would be sensible to use furigana)
>
> This is partly reflected in standards: only 思う is a Joyo reading
> (others marked with triangle in my 大辞泉), but OTOH I don’t
> know how you’d know other than by Googling that 想う is much more
> common than 念う – maybe 漢検 level?
>
> This is admittedly rather fine-grained popularity
> information, and rather laborious to determine and tricky to
> present, but it would be nice to include somehow someday.
>
> Referring to standards (what grade is a reading – e.g.,
> is it in Joyo? What level on the kanken?) gives a clear
> and objective way to do this w/o reinventing the wheel;
> real-world popularity would be interesting but lots of work.
>
>
>  ~nils
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne

References:
- Meaning of [iK] (irregular kanji)?
  - From: Nils Roland Barth <jmdict.nbarth@********>
- Re: [edict-jmdict] Meaning of [iK] (irregular kanji)?
  - From: Stuart McGraw <smcg4191@********>
- Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo *readings*)
  - From: Nils Roland Barth <jmdict.nbarth@********>

Prev by Date: JMdict internationalization effort - let's (finally) do it!
Next by Date: Re: [edict-jmdict] JMdict internationalization effort - let's (finally) do it!
Previous by thread: Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo *readings*)
Next by thread: JMdict internationalization effort - let's (finally) do it!
Index(es):
- Date
- Thread

Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo *readings*)

Re: [edict-jmdict] Meaning of [iK] (irregular kanji)? (+ Non-Joyo readings)