I'm not sure this is supposed to work yet.... I write utf-8 into mysql -
that seems to work. but when I read it back, not only do I get
a "classical perl string" instead of an utf8 string, but also some
of the bits are broken:
he string i write into the db is 6 characters long:
"A\N{greek:beta}cd\N{cyrillic:e}f"
character unicode utf8
hex binary
A 0041 01000001
? 03B2 1100111010110010
c 0063 01100011
d 0064 01100100
? 044D 1101000110001101
f 0066 01100110
what i get back from the db is
A 01000001
? 11001110
? 10110010
c 01100011
d 01100100
? 11010001
? 00111111
f 01100110
look at the penultimate byte! Is that supposed to happen?
PS.
I compiled everything myself, and used the following versions:
perl(5.8) + dbi(1.37) + dbd::mysql(2.1026) + mysql(4.1.0-alpha)
Here's the code I use:
$dbh = DBI->connect("DBI:mysql:test", "someuser", "somepass");
$uni = "A\N{greek:beta}cd\N{cyrillic:e}f";
$dbh->do("INSERT INTO unitest VALUES (NULL,?,?)", undef, $uni, $uni)
or print $dbh->errstr;
$id = $dbh->{insertid};
$sth = $dbh->prepare("SELECT * FROM unitest WHERE ID = ?");
$sth->execute( $id );
@row = $sth->fetchrow_array();
The table is a utf-8 table:
CREATE TABLE unitest (
ID mediumint(9) NOT NULL auto_increment,
vc varchar(80) default NULL,
t text,
PRIMARY KEY (ID)
) TYPE=MyISAM CHARSET=utf8;
PPS.: I keep a journal about my adventures with perl+mysql+unicode
at http://perlwelt.horus.at/Beispiele/Magic/PerlUnicodeMysql/
--
Brigitte 'I never met a chocolate I didnt like' Jellinek
bje...@horus.at http://www.horus.at/~bjelli
http://perlwelt.horus.at http://www.perlmonks.org/index.pl?node=bjelli
Thanks.
Tim.
On Tue, Jun 17, 2003 at 11:36:13AM +0100, Tim Bunce answered:
> Please try using another tool (eg mysqldump) to read the data from
> the db to check if the problem is related to inserting the data
> and/or retrieving it. Also try not using placeholders.
ok, i tried both: mysqldump confirms: the problem is not
in inserting, but in reading from the db.
To investigate this further, I wrote one-character strings into the db:
foreach (0x21 ... 0x7e, 0xa1 ... 0xff )
{
$char = pack "U", $_;
$dbh->do("INSERT INTO unitest VALUES (NULL,'$char','$char')")
or print $dbh->errstr;
$id = $dbh->{insertid};
$sth = $dbh->prepare("SELECT * FROM unitest WHERE ID = ?");
$sth->execute( $id );
@row = $sth->fetchrow_array();
# ....
}
I find that in the range 0xc0 to 0xdf the last three bits get set to
001:
write 11000011100 = \x{00C0}
read 11000011001
....
write 11000011100 = \x{00DF}
read 11000011001
The same holds for range 0x010d to 0x011f, range 0x0140 to 0x015f, range
0x1213 to 0x181f, range 0x1840 to 0x185f, and so on.
ps.
i use perl(5.8) + dbi(1.37) + dbd::mysql(2.1026) + mysql(4.1.0-alpha)
pps.
my table is
CREATE TABLE unitest (
ID mediumint(9) NOT NULL auto_increment,
vc varchar(80) default NULL,
t text,
PRIMARY KEY (ID)
) TYPE=MyISAM CHARSET=utf8;
wrong! actually, the problem is in the inserting.
I'm now inserting it two ways:
1) from perl directly via DBI
2) perl -> file with sql -> mysql client
the file with the sql statements is still ok. if I look at the
data with the mysql-commandline client, version 1 and version 2 look the
same:
+------+------+------+-------+
| ID | code | vc | count |
+------+------+------+-------+
| 3058 | 010D | Ä? | 3 |
| 3167 | 010D | Ä? | 3 |
+------+------+------+-------+
when I read it out using perl+dbi, i get for example
for the unicode-character \x{010D}, which should be c with a reverse ^ on top,
the bits 1100010000111111
instead of 1100010010001101
ps.
i still use perl(5.8) + dbi(1.37) + dbd::mysql(2.1026) + mysql(4.1.0-alpha)
pps.
my table for this example is
CREATE TABLE chartab (
ID mediumint(9) NOT NULL auto_increment,
code varchar(10) default NULL,
vc varchar(10) default NULL,
count int(8),
PRIMARY KEY (ID)
) TYPE=MyISAM CHARSET=utf8;
--
Dipl.Ing. Brigitte Jellinek Horus GmbH
http://www.horus.at/team/bjelli.html Internet Beratung + Schulung
bje...@horus.at Web Design + Software Engineering
The problem is caused by mysql, not by perl.
Maybe I'm doing something really stupit, but it seems
that my mysql-server doesn't accept utf8 as utf8. In the
most basic version, using a utf-8 file with sql commands:
mysql --default-character-set=utf8 test < test-insert.sql
my mysql server still interprets the utf8 strings in the file
as latin1 (and mangles all the non-latin1-charachters).
I tried inserting
SET CHARACTER SET utf8;
into the file, but that didn't help.
I had to change all occurences of literal utf8 strings in my SQL
statements, like so:
INSERT INTO sometable VALUES (....,CONVERT(_utf8'blabla' USING utf8), ...);
where 'blabla' is a string that might contain utf8.
the same for SELECT:
SELECT * FROM sometabel
WHERE somecolumn LIKE CONVERT(_utf8'%blabla%' USING utf8);
I now use this workaround through DBI. the data I get back is
not flagged as utf8, but it is correct utf8, so I can decode it
without a problem:
# to search for books with a title containging $char:
my $sql = qq{
SELECT * FROM buch WHERE Originaltitel LIKE
CONVERT(_utf8 '%$char%' USING utf8)
};
my $sth = $dbh->prepare($sql);
$sth->execute();
my (@r, $i);
while( @r = $sth->fetchrow_array() ) {
$_ = decode("utf8", $_) foreach @r;
print "<p>@r</p>";
}
It works. I'm glad. Thanks for the help.
Brigitte
p.s.
you can find the details at
http://perlwelt.horus.at/Beispiele/Magic/PerlUnicodeMysql/
--
Brigitte 'I never met a chocolate I didnt like' Jellinek
bje...@horus.com http://www.horus.com/~bjelli/
http://perlwelt.horus.at http://www.perlmonks.org/index.pl?node=bjelli
Paul Dubois has written some very good books on MySQL
with sections on DBI use - I think Paul is on this list -
I'm hoping Paul will respond with what he knows about
utf8 use in MySQL.
--
Hardy Merrill
Red Hat, Inc.
My suggestion is that if you're using 4.1.0, wait for 4.1.1 and
try it again. There are lots of character-set-handling related
changes, and the problem you're seeing may well be fixed. I know
that some of the new work involves transmission of information
between client and server.
If you *are* using 4.1.1 from the development tree (and thus the
problem remains unsolved), then please report your findings to
bugs.mysql.com so that the problem can be addressed. Thanks!