[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Re: mysql limitations



[wmaton ([edict-jmdict] Re: mysql limitations) writes:]
>> > (FWIW, according to 
>> >
>> http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
>> > Postgresql supports 4-byte utf-8.)
>> 
>> I think I raised this issue previously, and someone responded that the
>> were something like only 200 entries affected - but still, let's do
>> this right at the start....

Just pointing out that the "200 entries" are in the JIS X 0213 part of
kanjidic2.xml, i.e. the "new" kanji that joined Unicode from JIS X 0213. 
There is no 4-byte utf-8 problem with either the JMdict/EDICT content or 
the JMNedict/ENAMDICT content, although it it is possible that some odd 
JIS213 kanji might get into the latter file at some stage.

I would not suggest letting the "4-byte utf-8 problem" drive the choice of
RDBMS.

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学