BEGIN CODE >>
#!/usr/bin/perl
use warnings;
use strict;
#use utf8;
use Encode;
# using utf8 causes the characters to be printed in latin-1 encoding
my %table = (
# spanish
# hexidecimal UTF-8 => actual UTF-8
'0xc381' => chr(hex('c3')) . chr(hex('81')), # 'Á',
'0xc389' => encode("utf8", "\x{00c9}"), # 'É',
'0xc38d' => 'Í',
'0xc393' => 'Ó',
'0xc391' => 'Ñ',
'0xc39a' => 'Ú',
'0xc3a1' => 'á',
'0xc3a9' => 'é',
'0xc3ad' => 'í',
'0xc3b3' => 'ó',
'0xc3b1' => 'ñ',
'0xc3ba' => 'ú',
);
foreach (sort keys %table) {
print "$_ = $table{$_}\n";
}
<< END CODE
When the 'use utf8' line is commented out, the script outputs the UTF-8
characters correctly. However, when the utf8 pragma is used, the
characters that are actually hard coded into the hash as UTF-8 (not the
Á or É) are printed in Latin-1. To my understanding, in Perl 5.8.x,
the only effect of the utf8 pragma is to tell the parser that literals
and variables may contain UTF-8 encoded characters. However in
practice, the utf8 pragma is effecting the script's output.
I have tested the script on Mac OSX 10.3.8 with Perl 5.8.1 and on
Fedora Core (not sure which version) running perl 5.8.3.
Can anyone explain why the utf8 pragma effects the output of the script?