Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tk::Entry and unicode/UTF8 file name problems

346 views
Skip to first unread message

Ulli Horlacher

unread,
Sep 13, 2008, 8:02:18 AM9/13/08
to
I have a (test) file named 'xöx' (German Umlaut in the middle).
My locale is set to ISO8859-15

No problem with normal perl I/O, for example:

perl -e '$file = shift; open F,$file or die $!; print "$file ok\n"' xöx
xöx ok

But when I query the file name via Tk::Entry I get an unicode/UTF8 string
on which open fails:

$T = MainWindow->new();

$entry = $T->Entry();
$button = $T->Button(-text => 'open', -command => \&openfile);
$entry->pack();
$button->pack();
MainLoop;

sub openfile {
$file = $entry->get;
open F,$file or die "cannot open $file - $!\n";
close F;
print "ok\n";
}

Tk::Error: cannot open xÃÂöx - No such file or directory
Tk callback for .button
Tk::__ANON__ at /usr/lib/perl5/Tk.pm line 247
Tk::Button::butUp at /usr/lib/perl5/Tk/Button.pm line 111
<ButtonRelease-1>
(command bound to event)


Why does Tk ignore my locale settings and returns a file name with wrong
character set encoding?
How can I force Tk to respect my locale settings?

--
Ullrich Horlacher Informationssysteme und Serverbetrieb
Rechenzentrum E-Mail: horl...@rus.uni-stuttgart.de
Universitaet Stuttgart Tel: ++49-711-685-65868
Allmandring 30 Fax: ++49-711-682357
70550 Stuttgart (Germany) WWW: http://www.rus.uni-stuttgart.de/

zentara

unread,
Sep 13, 2008, 10:03:35 AM9/13/08
to
On Sat, 13 Sep 2008 12:02:18 +0000 (UTC), Ulli Horlacher
<fram...@rus.uni-stuttgart.de> wrote:

>I have a (test) file named 'xöx' (German Umlaut in the middle).
>My locale is set to ISO8859-15
>
>No problem with normal perl I/O, for example:
>
>perl -e '$file = shift; open F,$file or die $!; print "$file ok\n"' xöx
>xöx ok
>
>But when I query the file name via Tk::Entry I get an unicode/UTF8 string
>on which open fails:
>
>$T = MainWindow->new();
>
>$entry = $T->Entry();
>$button = $T->Button(-text => 'open', -command => \&openfile);
>$entry->pack();
>$button->pack();
>MainLoop;
>
>sub openfile {
> $file = $entry->get;

use Encode;
$file = decode('utf8', $file);

#explanation
#this decode utf8 routine is used so filenames with extended
# ascii characters (unicode) in filenames, will work properly

use Encode;
opendir my $dh, $path or warn "Error: $!";
my @files = grep !/^\.\.?$/, readdir $dh;
closedir $dh;
# @files = map{ "$path/".$_ } sort @files;
#$_ = decode( 'utf8', $_ ) for ( @files );
@files = map { decode( 'utf8', "$path/".$_ ) } sort @files;


>How can I force Tk to respect my locale settings?

I'm no unicode expert, (why can't everyone use ascii US :-) ?)
I've seen this used:

use POSIX qw(strftime setlocale LC_ALL LC_CTYPE);
my ($loc) = POSIX::setlocale( &POSIX::LC_ALL, "de_AT" );

#or sometimes
$ENV{LANG} = "de_AT";

zentara


--
I'm not really a human, but I play one on earth.
http://zentara.net/Remember_How_Lucky_You_Are.html

Ulli Horlacher

unread,
Sep 13, 2008, 10:29:29 AM9/13/08
to
zentara <zen...@highstream.net> wrote:

> >sub openfile {
> > $file = $entry->get;
> use Encode;
> $file = decode('utf8', $file);

Makes it even worse:

framstag@juhu:~/bin: xx.pl x鰔
Wide character in print at /home/framstag/bin/xx.pl line 29.
x锟#x
Tk::Error: cannot open x寐驴寐#x - No such file or directory


> >How can I force Tk to respect my locale settings?
>
> I'm no unicode expert, (why can't everyone use ascii US :-) ?)

Because not everyone writes in english? :-)


> I've seen this used:
>
> use POSIX qw(strftime setlocale LC_ALL LC_CTYPE);
> my ($loc) = POSIX::setlocale( &POSIX::LC_ALL, "de_AT" );

This will break on non-POSIX system like Windows.
Besides this, my locale settings ARE correct. It is Tk which is ignoring
it and forces unicode. But I have no unicode file names.

zentara

unread,
Sep 14, 2008, 8:20:24 AM9/14/08
to
On Sat, 13 Sep 2008 14:29:29 +0000 (UTC), Ulli Horlacher
<fram...@rus.uni-stuttgart.de> wrote:

>zentara <zen...@highstream.net> wrote:

>> >How can I force Tk to respect my locale settings?
>>
>> I'm no unicode expert, (why can't everyone use ascii US :-) ?)

>This will break on non-POSIX system like Windows.


>Besides this, my locale settings ARE correct. It is Tk which is ignoring
>it and forces unicode. But I have no unicode file names.

There are some smart unicode experts at http://www.perlmonks.org
I suggest you ask there.

There was some weird encoding problem with your script, I couldn't even
run it, and open an image. Maybe it's some UTF-16 problem caused by
windows?

Ulli Horlacher

unread,
Sep 16, 2008, 10:08:07 AM9/16/08
to
I have tracked down the charset problem :

$T = MainWindow->new();

$entry = $T->Entry()->pack();

$file = shift || '';

warn "before entry: $file\n";
$entry->insert(0,$file);
warn "after entry: $file\n";


And when I run this code:


framstag@diaspora: openfile.pl xöx.avi
before entry: xöx.avi
after entry: xöx.avi


Tk:Entry modifies its arguments!
It recodes it to UTF8 though my locale setting is ISO-8859-1!

Just another annoying Tk bug...

Lamprecht

unread,
Sep 16, 2008, 2:04:18 PM9/16/08
to Ulli Horlacher
Ulli Horlacher wrote:
> I have tracked down the charset problem :
>
> $T = MainWindow->new();
>
> $entry = $T->Entry()->pack();
>
> $file = shift || '';
>
$file = decode("utf8", $file);#???

> warn "before entry: $file\n";
> $entry->insert(0,$file);
> warn "after entry: $file\n";
>
>
> And when I run this code:
>
>
> framstag@diaspora: openfile.pl xöx.avi
> before entry: xöx.avi
> after entry: xöx.avi
>
>
> Tk:Entry modifies its arguments!
> It recodes it to UTF8 though my locale setting is ISO-8859-1!
>
> Just another annoying Tk bug...
>
>

Cheers,

Does that help?

Christoph

--
use Tk;use Tk::GraphItems;$c=tkinit->Canvas->pack;push@i,Tk::GraphItems->
TextBox(text=>$_,canvas=>$c,x=>$x+=70,y=>100)for(Just=>another=>Perl=>Hacker);
Tk::GraphItems->Connector(source=>$i[$_],target=>$i[$_+1])for(0..2);
$c->repeat(30,sub{$_->move(0,4*cos($d+=3.16))for(@i)});MainLoop

Ulli Horlacher

unread,
Sep 16, 2008, 5:08:01 PM9/16/08
to
Lamprecht <ch.l.ngr...@online.de> wrote:
> Ulli Horlacher wrote:
> > I have tracked down the charset problem :
> >
> > $T = MainWindow->new();
> >
> > $entry = $T->Entry()->pack();
> >
> > $file = shift || '';
> >
> $file = decode("utf8", $file);#???

This does a recode from UTF-8 to UTF-8, which makes not much sense :-)

> > Tk:Entry modifies its arguments!
> > It recodes it to UTF8 though my locale setting is ISO-8859-1!

The bug is in Tk and my workaround (for UNIX) is:

$file = locale($file);

(...)

sub locale {
my $string = shift;
my $ctype = $ENV{LC_CTYPE} || $ENV{LANG} || $ENV{LC_ALL};

if ($ctype) {
if ($ctype =~ /utf8/i) {
return $string;
} elsif (grep { $ctype =~ /^$_$/i } Encode->encodings()) {
return encode($ctype,$string);
} else {
return encode('iso-8859-1',$string);
}
}

return $string;

Lamprecht

unread,
Sep 17, 2008, 3:32:22 AM9/17/08
to
Ulli Horlacher schrieb:

> Lamprecht <ch.l.ngr...@online.de> wrote:
>> Ulli Horlacher wrote:
>>> I have tracked down the charset problem :
>>>
>>> $T = MainWindow->new();
>>>
>>> $entry = $T->Entry()->pack();
>>>
>>> $file = shift || '';
>>>
>> $file = decode("utf8", $file);#???
>
> This does a recode from UTF-8 to UTF-8, which makes not much sense :-)
>
>
Hi,

It sets the UTF-8 flag on $file. But my locale is set to de_DE.UTF-8 So this
might be the reason, why it worked for me...

Cheers, Christoph

Ulli Horlacher

unread,
Sep 17, 2008, 1:11:47 PM9/17/08
to
Lamprecht <ch.l.ngr...@online.de> wrote:

> >>> $file = shift || '';
> >>>
> >> $file = decode("utf8", $file);#???
> >
> > This does a recode from UTF-8 to UTF-8, which makes not much sense :-)
>

> It sets the UTF-8 flag on $file.

Yes. But $file has already the UTF-8 flag, because it comes from @ARGV


> But my locale is set to de_DE.UTF-8 So this might be the reason, why it
> worked for me...

As I wrote: Tk functions like Entry transcodes variables in the caller
context into UTF-8 and do not look at the locale or UTF-flag.

If your system is native UTF-8, then this makes no difference, but my
system has ISO-8859-1

Ok, I have to write another perl/tk bugreport...

zentara

unread,
Sep 18, 2008, 7:41:54 AM9/18/08
to
On Wed, 17 Sep 2008 17:11:47 +0000 (UTC), Ulli Horlacher
<fram...@rus.uni-stuttgart.de> wrote:

>Lamprecht <ch.l.ngr...@online.de> wrote:
>
>> >>> $file = shift || '';
>> >>>
>> >> $file = decode("utf8", $file);#???
>> >
>> > This does a recode from UTF-8 to UTF-8, which makes not much sense :-)
>>
>> It sets the UTF-8 flag on $file.
>
>Yes. But $file has already the UTF-8 flag, because it comes from @ARGV
>
>
>> But my locale is set to de_DE.UTF-8 So this might be the reason, why it
>> worked for me...
>
>As I wrote: Tk functions like Entry transcodes variables in the caller
>context into UTF-8 and do not look at the locale or UTF-flag.
>
>If your system is native UTF-8, then this makes no difference, but my
>system has ISO-8859-1
>
>Ok, I have to write another perl/tk bugreport...

First try reading http://perlmonks.org?node_id=698074

and here is something I don't quite understand, but may help
at the top of your script.
Put:
$Tk::encodeFallback = 1;

#########################################
from old postings:
>> The problem occures, if I have ISO8859-1 characters (german umlauts)
>> written to the widget, anyway, the characters are displayed correctly.

>just found it on the web:

in Tk.pm:

$Tk::encodeFallback = Encode::FB_PERLQQ;

which means: PERLQQ + LEAVE_SRV (=0x108) => causes the behaviour,
but changed to:

$Tk::encodeFallback = Encode::PERLQQ;

which means: PERLQQ (=0x100) => does it fine !

Now I've just to find out for myself what is the better way to I patch
that:
in Tk.pm or in each particular script using Tk::Text.

#########################################

Ulli Horlacher

unread,
Sep 22, 2008, 12:40:00 PM9/22/08
to
zentara <zen...@highstream.net> wrote:

> >As I wrote: Tk functions like Entry transcodes variables in the caller
> >context into UTF-8 and do not look at the locale or UTF-flag.
>

This explains that Tk awaits UTF-8 strings but not why it transcodes
variables in the caller context.

> and here is something I don't quite understand, but may help
> at the top of your script.
> Put:
> $Tk::encodeFallback = 1;
>
> #########################################
> from old postings:
> >> The problem occures, if I have ISO8859-1 characters (german umlauts)
> >> written to the widget, anyway, the characters are displayed correctly.
>
> >just found it on the web:
>
> in Tk.pm:
>
> $Tk::encodeFallback = Encode::FB_PERLQQ;
>
> which means: PERLQQ + LEAVE_SRV (=0x108) => causes the behaviour,
> but changed to:
>
> $Tk::encodeFallback = Encode::PERLQQ;
>
> which means: PERLQQ (=0x100) => does it fine !

I have:

$Tk::encodeFallback = 1;
$file = shift;
$file_entry = $TOP->Entry(@relief);
warn $file;
$file_entry->insert(0,$file);
warn $file;

and when I run my program:

framstag@diaspora:/scr/samba: schwuppdiwupp.pl xöx.avi
xöx.avi at ./schwuppdiwupp.pl line 115.
xöx.avi at ./schwuppdiwupp.pl line 117.


As you see: Tk still modifies the variables in the caller context. And
this is wrong.

I tried it also with:

$Tk::encodeFallback = Encode::FB_PERLQQ;

$Tk::encodeFallback = Encode::PERLQQ;

- no difference.

zentara

unread,
Sep 23, 2008, 10:02:18 AM9/23/08
to
On Mon, 22 Sep 2008 16:40:00 +0000 (UTC), Ulli Horlacher
<fram...@rus.uni-stuttgart.de> wrote:

>zentara <zen...@highstream.net> wrote:
>
>> >As I wrote: Tk functions like Entry transcodes variables in the caller
>> >context into UTF-8 and do not look at the locale or UTF-flag.
>>
>> First try reading http://perlmonks.org?node_id=698074
>
>This explains that Tk awaits UTF-8 strings but not why it transcodes
>variables in the caller context.

> $Tk::encodeFallback = Encode::FB_PERLQQ;


> $Tk::encodeFallback = Encode::PERLQQ;
>
>- no difference.

Just for comparison, try Gtk2 and see if it works for you.
That will tell you if it's Tk or your system. Maybe switch to Gtk2
if it works for you?

#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use Encode;
use Gtk2 '-init';

#make some big fonts
Gtk2::Rc->parse_string(<<__);

style "normal" {
font_name ="serif 30"
}

style "my_entry" {
font_name ="sans 25"
text[NORMAL] = "#FF0000"
}

widget "*" style "normal"
widget "*Entry*" style "my_entry"
__

my $window = Gtk2::Window->new;
$window->set_title("Hello world");
$window->signal_connect( destroy => sub { exit } );

my $vbox = Gtk2::VBox->new();
$vbox->set( "border_width" => 10 );
$window->add($vbox);

#some text with umlauts, accents, etc
my $text = encode( 'utf8','ועפשץי');
$text .= chr(916); # add a greek delta to see if chr works

my $label = Gtk2::Label->new($text);
$vbox->pack_start( $label, 0, 0, 5 ); # expand?, fill?, padding

my $entry = Gtk2::Entry->new();
$vbox->pack_start( $entry, 0, 0, 5 );
$entry->set_text($text);

my $button = Gtk2::Widget->new( "Gtk2::Button", label => "Quit" );
$button->signal_connect( clicked => sub{exit} );

$vbox->pack_start( $button, 0, 0, 5 );

$window->show_all();

Gtk2->main;
__END__

0 new messages