Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Characterset problem

2 views
Skip to first unread message

Dr. Michael L. Dowling

unread,
Apr 19, 2014, 9:31:12 AM4/19/14
to dbd...@perl.org
Hello everyone!

I'm new to this group so I hope I don't begin with any faux pas.

I have a problem and don't seem to be able to make any headway with it.

I have been using Arch Linux and Perl DBI/DBD for Postgresql quite
happily now for years. My database contains German special characters
and I have configured Arch Linux to use UTF-8. Everything was fine
until Arch upgraded to DBD 3.0. DBI/DBD still worked well for INSERT
statements, but the results of SELECT statements are now ISO-8859 8 bit
characters. This has not changed after the release of DBD version 3.1.
The simple work-around to the re-install the Arch package
perl-dbd-pg-2.19.3-2-x86_64.pkg.tar.xz; that restores the correct
behaviour. This is why I concluded that the problem probably lies with
DBD and not DBI for Postgresql.

In my select statements I even have explicitly:

SET CLIENT_ENCODING TO 'UTF8';

I would be grateful for any pointers

As a Kiwi in Germany I have a rather odd locale, namely:

LANG=en_NZ.UTF-8
LC_CTYPE=en_NZ.UTF-8
LC_NUMERIC="en_NZ.UTF-8"
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_NZ.UTF-8"
LC_MONETARY=en_IE.UTF-8
LC_MESSAGES="en_NZ.UTF-8"
LC_PAPER="en_NZ.UTF-8"
LC_NAME="en_NZ.UTF-8"
LC_ADDRESS="en_NZ.UTF-8"
LC_TELEPHONE="en_NZ.UTF-8"
LC_MEASUREMENT="en_NZ.UTF-8"
LC_IDENTIFICATION="en_NZ.UTF-8"
LC_ALL=

Yours,
Mike Dowling

--
Dr. Michael L. Dowling
Gaußstr. 27
38106 Braunschweig
Germany

Greg Sabino Mullane

unread,
Apr 19, 2014, 5:12:10 PM4/19/14
to Mike.D...@t-online.de, dbd...@perl.org, dbd...@perl.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> statements, but the results of SELECT statements are now ISO-8859 8 bit
> characters.

Hi. It's hard to say without more data. Setting client_encoding to UTF-8
is a great first step. What makes you think the strings are returned
as ISO-8859? Can you show us a test script and its output, or better yet,
a self-contained test script we can run?

- --
Greg Sabino Mullane gr...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201404191711
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlNS5oQACgkQvJuQZxSWSsjzhACg0fsZDmosWhPcOenDROAto6kk
WSwAoJWHtkDSEVcv0DlsFCl61M2VqrMu
=J2G6
-----END PGP SIGNATURE-----


Dr. Michael L. Dowling

unread,
Apr 21, 2014, 5:04:03 AM4/21/14
to Greg Sabino Mullane, dbd...@perl.org
Hello Greg!

Thanks for the reply.

On Sat, Apr 19, 2014 at 09:12:10PM -0000, Greg Sabino Mullane wrote:
>
> > statements, but the results of SELECT statements are now ISO-8859 8 bit
> > characters.
>
> Hi. It's hard to say without more data. Setting client_encoding to UTF-8
> is a great first step. What makes you think the strings are returned
> as ISO-8859? Can you show us a test script and its output, or better yet,
> a self-contained test script we can run?

I have created a minimal example. I have created a new database TEST
with a single table with a single row that contains German special
characters using UTF-8. I used Emacs and the shell package to make a
transcript as to how I did this; see attached file "create".

I have made a dump of this database and have attached it as test.db.

$ file test.db

yields

UTF-8 Unicode text.

Using the binary editor bvi also indicates that these are two byte encodings.

I then created a minimal perl script that reads from this table and
writes to STDOUT. I have attached this script as well; test.pl.

Redirecting the output fro test.pl to a file and then using the file
binary again, one gets

$ ISO-8859 text, with no line terminators

The bvi indicates single byte German national characters.

Again, I'd be very grateful for any pointers on this. I suspect that my
configuration here could be responsible as obviously nobody else has
squealed. I changed to UTF-8 only about a year ago, and perhaps I
missed something. Yet for the life of me I know not what. As I said in
my original post, things got broken with DBD version 3.0.

What could have changed with 3.0 that could possibly have caused this?

Cheers,

Mike Dowling

PS
During the week I have no access to my email; only from Fridays to
Sundays, so my responses may be slow.
create
test.pl
test.db

Alvar Freude

unread,
Apr 21, 2014, 5:51:58 AM4/21/14
to Mike.D...@t-online.de, Greg Sabino Mullane, dbd...@perl.org
Hi,

Am 21.04.2014 um 11:04 schrieb Dr. Michael L. Dowling <Mike.D...@t-online.de>:

> Redirecting the output fro test.pl to a file and then using the file
> binary again, one gets
>
> $ ISO-8859 text, with no line terminators
>
> The bvi indicates single byte German national characters.
>
> Again, I'd be very grateful for any pointers on this.

you have to say to Perl, that it should print everything in UTF-8. Without this, Perl prints it in Latin-1, nevertheless if it’s correct internally or not :-(

e.g.

binmode( STDOUT, ":utf8" );
binmode( STDERR, ":utf8" );


http://perldoc.perl.org/perlunicode.html
http://perlgeek.de/de/artikel/charsets-unicode
http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html


Ciao
Alvar

signature.asc
0 new messages