Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Binding strings with utf-8 octet sequences with 3.x

6 views
Skip to first unread message

Daniel Verite

unread,
Jan 4, 2016, 2:15:02 PM1/4/16
to dbd...@perl.org
Hi,

I'm wondering about a change in 3.x concerning UTF-8.

When comparing the behaviors of the DBD::Pg version that ships with
Ubuntu 14.04 (libdbd-pg-perl 2.19.3-2) against a self-compiled 3.5.3,
I notice that the bind parameters now are interpreted differently,
independantly of pg_enable_utf8's value.

For instance, consider the following code, ran against an UTF-8
database:

$dbh->do("SET client_encoding TO UTF8");
$dbh->{pg_enable_utf8}=0;
binmode(STDOUT);

$p = "\xc3\xa9"; # U+00C9 as an utf-8 octet sequence
$sth = $dbh->prepare("SELECT ?,length(?),octet_length(?)",
{pg_server_prepare=>0});
$sth->execute($p,$p,$p);
@r = $sth->fetchrow_array;
printf "v%s, sending %s, getting back: %s %s %s\n",
$DBD::Pg::VERSION, $p, @r;

With DBD::Pg 2.19.3 the client output is:
v2.19.3, getting back: é 1 2
and the server log (log_statement=all), seen on a utf-8 terminal:
statement: SELECT 'é',length('é'),octet_length('é')
This is fine and what I expect.

With DBD::Pg 3.5.3, the client output is:
v3.5.3, sending é, getting back: é 2 4
server log (log_statement=all), seen on a utf-8 terminal
statement: SELECT 'é',length('é'),octet_length('é')

My expectation was that 3.x would behave like 2.x with
the above code, especially when pg_enable_utf8 is 0.
It seems that with the newer version, it results
in double encoding the parameter, as shown by
the character and octet lengths at the server end.

Anyway, is the above output the expected behavior?

And is there a way to make it just pass-through the parameters
that don't have the utf-8 flag set?


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Daniel Verite

unread,
Jan 25, 2016, 10:15:01 AM1/25/16
to dbd...@perl.org
Daniel Verite wrote:

> With DBD::Pg 3.5.3, the client output is:
> v3.5.3, sending é, getting back: é 2 4
> server log (log_statement=all), seen on a utf-8 terminal
> statement: SELECT 'é',length('é'),octet_length('é')
>
> My expectation was that 3.x would behave like 2.x with
> the above code, especially when pg_enable_utf8 is 0.
> It seems that with the newer version, it results
> in double encoding the parameter, as shown by
> the character and octet lengths at the server end.

It appears that my report is essentially the same as:
https://rt.cpan.org/Public/Bug/Display.html?id=103137

which also comes with a proposed bugfix, but it's been
left unprocessed so far.
0 new messages