[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [edict-jmdict] Tighter rules for reading fields

To: edict-jmdict@***************
Subject: RE: [edict-jmdict] Tighter rules for reading fields
From: Stuart McGraw <smcg4191@********>
Date: Sat, 28 Feb 2009 18:19:18 -0700

Jim Breen wrote:

2009/2/28 Stuart McGraw <smcg4191@frii.com>:
> I just ran my parser on the current jmdict.xml file and it
> reports the following...
>
> The complaints about reb with '?' are because they are
> character 301C (WAVE DASH) rather than FF5E (FULL-WIDTH TILDE)
> you gave above. (30C1 doesn't seem to have a representation
> in JIS which this email is in, hence the '?').

I'm not sure I can do a lot about that one. It's a known
round-trip problem between JIS and Unicode. See:
http://en.wikipedia.org/wiki/Unicode#Mapping_to_legacy_character_sets


Thanks for the reference.  I think I have a (still a

little fuzzy) understanding of the issue, but can seethat you want to maintain JIS X 208 compatibility in

the reb/keb elements to the maximum extent possible,

and that using U+FF5E will break that. I was justconfused because I saw a U+FF5E in your email.

The only other thing I wondered about is if itwould make sense to allow a small set of punctuationcharacters in the reb -- in case jmdict gets morephrases or expressions. (I noticed one of thepreviously reported warnings was for an entrycontaining a comma, though it's gone now.)

[...]
I think all those should be OK now. Please check them tomorrow
when the next version goes out.


Seq 1262730: Conflicting pri value 'nf41' in reading げんぜん, kanji 厳然
Seq 1274190: Conflicting pri value 'nf17' in reading こうぜん, kanji 公然
Seq 1376200: Conflicting pri value 'nf32' in reading せいぜん, kanji 整然
Seq 1475790: Conflicting pri value 'nf22' in reading ばくぜん, kanji 漠然
Seq 2405880: lsource has attribute(s)  but no text
 Not sure if the above is an error or intended.
Seq 2415870: keb text '?' not kanji.

Follow-Ups:
- Re: [edict-jmdict] Tighter rules for reading fields
  - From: Jim Breen <jimbreen@*********>

Prev by Date: Re: [edict-jmdict] Tighter rules for reading fields
Next by Date: Re: [edict-jmdict] Tighter rules for reading fields
Previous by thread: Re: [edict-jmdict] A few more jmdict glitches
Next by thread: Re: [edict-jmdict] Tighter rules for reading fields
Index(es):
- Date
- Thread