This is a bug report for perl from ma...@kasei.com,
generated with the help of perlbug 1.34 running under perl v5.8.0.
-----------------------------------------------------------------
When running in a UTF environment, Locale::Language doesn't load:
LANG=en_GB.utf8 perl -we 'use Locale::Language'
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 115, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 109.
Malformed UTF-8 character (unexpected non-continuation byte 0x6c, immediately after start byte 0xe5) in lc at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 115, <DATA> line 178.
Malformed UTF-8 character (unexpected end of string) at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 178.
Malformed UTF-8 character (unexpected non-continuation byte 0x6b, immediately after start byte 0xfc) in lc at /usr/share/perl/5.8.0/Locale/Language.pm line 117, <DATA> line 178.
The fix:
--- lib/Locale/Language.pm.orig 2002-09-19 15:17:16.000000000 +0200
+++ lib/Locale/Language.pm 2002-09-19 15:17:41.000000000 +0200
@@ -231,7 +231,7 @@
my:Burmese
na:Nauru
-nb:Norwegian Bokmĺl
+nb:Norwegian Bokmal
nd:Ndebele, North
ne:Nepali
ng:Ndonga
@@ -300,7 +300,7 @@
uz:Uzbek
vi:Vietnamese
-vo:Volapük
+vo:Volapuk
wo:Wolof
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.8.0:
Configured by Debian Project at Sat Sep 14 18:17:32 UTC 2002.
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.19, archname=i386-linux-thread-multi
uname='linux cyberhq 2.4.19 #1 smp sun aug 4 11:30:45 pdt 2002 i686 unknown unknown gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8.0 -Darchlib=/usr/lib/perl/5.8.0 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.0 -Dsitearch=/usr/local/lib/perl/5.8.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.0 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O3',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.95.4 20011002 (Debian prerelease)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.2.5.so, so=so, useshrplib=true, libperl=libperl.so.5.8.0
gnulibc_version='2.2.5'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.0:
/home/marty/Perl
/etc/perl
/usr/local/lib/perl/5.8.0
/usr/local/share/perl/5.8.0
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8.0
/usr/share/perl/5.8.0
/usr/local/lib/site_perl
.
---
Environment for perl v5.8.0:
HOME=/home/marty
LANG=en_GB.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/marty/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:/usr/local/sbin
PERLLIB=/home/marty/Perl
PERL_BADLANG (unset)
SHELL=/bin/bash
The DATA is Locale::Language contains 2 Latin-1 characters.
When Perl 5.8 is running in a UTF8 locale it expects the DATA to be UTF8
so it dies when it finds a malformed character.
Adding 'use bytes' to Locale::Language would stop the death, but the
included non-ASCII characters don't work properly on non-Latin1 systems.
So I think it is better to replace the 2 problem characters with ASCII.
Here's my suggested patch. I've tried to ensure I've included the
actual Latin1 characters in this email, but as I don't use a Latin1
system they will probably be converted when I send this: sorry.
--- lib/Locale/Language.pm.orig 2002-09-19 15:17:16.000000000 +0200
+++ lib/Locale/Language.pm 2002-09-19 15:17:41.000000000 +0200
@@ -231,7 +231,7 @@
my:Burmese
na:Nauru
-nb:Norwegian Bokm?l
+nb:Norwegian Bokmal
nd:Ndebele, North
ne:Nepali
ng:Ndonga
@@ -300,7 +300,7 @@
uz:Uzbek
vi:Vietnamese
-vo:Volap?k
+vo:Volapuk
wo:Wolof
--- ./lib/Locale/Codes/t/languages.t.orig 2002-09-19 15:17:16.000000000 +0200
+++ ./lib/Locale/Codes/t/languages.t 2002-09-19 15:17:16.000000000 +0200
@@ -47,7 +47,7 @@
'code2language("nd") eq "Ndebele, North"',
'code2language("ng") eq "Ndonga"',
'code2language("nn") eq "Norwegian Nynorsk"',
- 'code2language("nb") eq "Norwegian Bokm?l"',
+ 'code2language("nb") eq "Norwegian Bokmal"',
'code2language("ny") eq "Chichewa; Nyanja"',
'code2language("oc") eq "Occitan (post 1500)"',
'code2language("os") eq "Ossetian; Ossetic"',
--
Marty
Why not \x{00xx} escape ? - would be more robust for patching as well.
As mailers (including mine) are variously mangling these diffs.
Nick Ing-Simmons
http://www.ni-s.u-net.com/
The Language.pm file contains the Latin1 characters in the DATA section
so I can't use the escape sequences there. But the other reason was
more important to me: the Latin1 characters cause bad things to happen
when used in a non-Latin1 environment; in EUC-JP for example, they
either don't display at all, or they merge with the next character and
display some obscure kanji.
--
Marty
Thanks, applied as change #17927.
Sending the patch as an attachment, either instead of or as well as
the inline version, is usually the best way to ensure the integrity
of the patch when you are unsure what your mailer will do to it.
Hugo