[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [edict-jmdict] uk tag question

To: <edict-jmdict@***************>
Subject: RE: [edict-jmdict] uk tag question
From: "Stuart McGraw" <smcg4191@********>
Date: Sun, 28 Jan 2007 13:19:14 -0700
Importance: Normal

Jim Breen wrote:
> [Stuart McGraw ([edict-jmdict] uk tag question) writes:]
>[...]
> I didn't know there were any "uk" tags in the re_inf. I just checked, and 
> could only find one - in the new 樺太柳葉魚 entry. I moved it to the
> <misc> with all the others.
> 
> Can you tell me where you saw some of the others?

Sorry, my mistake.   There was just one.  I was mistakenly
looking at 'uK' tags in r_inf but I guess that raises the same 
question for them.  There are 9 uK tags in rinf, and 5 in misc.

misc: 1225700,1812570,2077340,2082710,2123440
rinf: 2113750,2114610,2114630,2115990,2118810,
    2119780,2121430,2121440,2128660

>[...]
> >> Also, do entry sequence numbers >9000000 indicate anything
> >> special about those entries?
> 
> As Jean-Luc remarked, they are from the JIS212_containing "edicth" file.
> I have kept these apart for legacy software reasons - a lot of software
> out was written to use EDICT in EUC or Shift_JIS, and hasn't catered
> for JIS212 characters (Shift_JIS can't encode them; EUC can but it's a 
> 3-byte code and needs special handling).
> 
> When we get the database firing, these entries can be rolled into the
> main file, although there needs to be a way of generating a subset
> that involves just JIS208 characters.

The 9x entries are in jmdict so they are in the database now.  So they 
shoud not be exported to your master file when it's genereated from the 
database?  
I was guessing that the seq numbers are probably relatively immutable
since jmdict users may rely on them to identify the "same" words across
different versions of the jmdict file.. So the 9x seq numbers could continue 
to identify jis212 words as now.  Can one distingush between jis208 and 
jis212 characters based on unicode code point (other than using a 
character lookup table?)

Follow-Ups:
- Re: [edict-jmdict] uk tag question
  - From: Jean-Luc Léger <reiga@****************>

References:
- Re: [edict-jmdict] uk tag question
  - From: Jim Breen <Jim.Breen@**********************>

Prev by Date: Re: [edict-jmdict] Verb Compounds
Next by Date: Re: [edict-jmdict] uk tag question
Previous by thread: Re: [edict-jmdict] uk tag question
Next by thread: Re: [edict-jmdict] uk tag question
Index(es):
- Date
- Thread