[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: Re: [edict-jmdict] Re: Choosing a database backend
On 9/28/06, Ben Bullock <benkasminbullock@gmail.com> wrote:
On 28/09/06, Kim Ahlström <kim.ahlstrom@gmail.com> wrote:
>
> When the database is latin1 and you feed it utf8, everyting goes as
> planned and I get the same result as you. But when the database is
> utf8 I get the 302 discarded literals. Which is consistent with what
> David said earlier in the thread.
Yeah, but it isn't consistent with me, because I'm using kanjidic2.xml
and feeding it into the database as utf8 and I'm not getting those
discarded literals that you describe. Thus, it's either a software
problem in your or my software or possibly a version incompatibility.
According to the documentation for MySQL 5.1 that I quoted in my
original message, it should not work. I think you can test your data
by using "length" and "char_length".
select length( kanji_field ) from kanji_table where ...
This should return "3" or "4" depending on the data.
select char_length( kanji_field ) from kanji_table where ...
This should return "1".
You haven't said what version of MySQL you're using, so that's the
first place to start. You haven't mentioned what language your
software uses, but there are all kinds of possibilities. The language
I'm using, Perl, has all kinds of snags in its utf8 handling. If your
program is in Perl you could send it to me and I'll take a look. I'm
still struggling with Python though, unfortunately.
I know that with perls DBD::MySQL one have to manually tag utf8 data
as such. There are rumours that the new DBI2 will fix that and have it
automatically infere the encoding from the database. How does this
work in Python?
Again it's not a general problem - it's just your problem with your
database at the moment, and it could well be caused by problems with
your software. You need to prove that the discarded literals are being
discarded by MySQL, and since this isn't happening to me, I'm
unconvinced.
Well, your experience goes against the newest MySQL manuals claims.
Though, documentation in general does not have a good reputation for
being in sync with the real world.
--
David,