Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CVS dpkg flaws

1 view
Skip to first unread message

YAEGASHI Takeshi

unread,
Dec 31, 2001, 6:25:00 AM12/31/01
to
Hi,

CVS dpkg is doing setlocale(LC_CTYPE, "C"); here and there, but it
seems to cause that dpkg and dselect's outputs for Japanese characters
turn into "?"(question mark) under Japanese locale (LANG=ja_JP.eucJP).

# LANG=C dpkg -i ../dpkg_1.10_i386.deb
(Reading database ... 60158 files and directories currently installed.)
Preparing to replace dpkg 1.10 (using ../dpkg_1.10_i386.deb) ...
install-info(/usr/info/Guidelines): no backup file /var/backups/infodir.bak available, giving up.
dpkg: warning - old pre-removal script returned error exit status 1
dpkg - trying script from the new package instead ...
dpkg: ... it looks like that went OK.
Unpacking replacement dpkg ...
Setting up dpkg (1.10) ...

# LANG=ja_JP.eucJP dpkg -i ../dpkg_1.10_i386.deb
(???????????????... ?? 60158 ???????????????????????????)
dpkg 1.10 ?(../dpkg_1.10_i386.deb ?)???????????????...
install-info(/usr/info/Guidelines): no backup file /var/backups/infodir.bak available, giving up.
dpkg: ?? - ?? pre-removal ????? ????????? 1 ???????
dpkg - ??????????????????????????...
dpkg: ... OK ??????
dpkg ????????????...
dpkg (1.10) ???????? ...

How should I fix it? (Yes, I could get proper Japanese outputs with
OUTPUT_CHARSET=EUC-JP)


And this is a simple bug fix.

Index: scripts/dpkg-architecture.pl
===================================================================
RCS file: /cvs/dpkg/dpkg/scripts/dpkg-architecture.pl,v
retrieving revision 1.24
diff -u -r1.24 dpkg-architecture.pl
--- scripts/dpkg-architecture.pl 2001/10/21 17:58:07 1.24
+++ scripts/dpkg-architecture.pl 2001/12/31 10:54:26
@@ -63,7 +63,7 @@
's390', 's390-linux',
'ia64', 'ia64-linux',
'openbsd-i386', 'i386-openbsd',
- 'freebsd-i386', 'i386-freebsd'
+ 'freebsd-i386', 'i386-freebsd',
'darwin-powerpc', 'powerpc-darwin',
'darwin-i386', 'i386-darwin');

--
YAEGASHI Takeshi <t...@keshi.org> <tak...@yaegashi.jp> <yaeg...@dodes.org>


--
To UNSUBSCRIBE, email to debian-dp...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Fumitoshi UKAI

unread,
Dec 31, 2001, 9:20:38 AM12/31/01
to
Hi,

At Mon, 31 Dec 2001 20:24:03 +0900,
YAEGASHI Takeshi wrote:

> CVS dpkg is doing setlocale(LC_CTYPE, "C"); here and there, but it
> seems to cause that dpkg and dselect's outputs for Japanese characters
> turn into "?"(question mark) under Japanese locale (LANG=ja_JP.eucJP).

setlocale(LC_CTYPE, "C") breaks gettext() encoding conversion features.
Wichert, why did you add this?

main/main.c, for instance:
| revision 1.44
| date: 2001/07/16 13:21:27; author: wakkerma; state: Exp; lines: +1 -0
| Force LC_CTYPE to C

Thanks,
Fumitoshi UKAI

Fumitoshi UKAI

unread,
Dec 31, 2001, 9:31:48 AM12/31/01
to
At Mon, 31 Dec 2001 15:24:32 +0100,

Wichert Akkerman wrote:
> > setlocale(LC_CTYPE, "C") breaks gettext() encoding conversion features.
> > Wichert, why did you add this?
>
> ChangeLog says:
>
> Mon Jul 16 15:20:07 CEST 2001 Wichert Akkerman <wakk...@debian.org>
>
> * main/main.c, main/query,c. dselect/main.cc: use C locale for LC_CTYPE
> so we can be sure packagename and version comparisons work as expected
>
> Why would setting LC_CTYPE break output routines?

Because AFAIK gettext() converts message's encoding as locale configurations.
If it fails, it'll output the characters as `?'.
Since Japanese characters couldn't be converted to ASCII chars, all
message will be represented as `?'

See also "Charset convension" in gettext info.

Wichert Akkerman

unread,
Dec 31, 2001, 9:50:28 AM12/31/01
to
Previously Fumitoshi UKAI wrote:
> Because AFAIK gettext() converts message's encoding as locale configurations.
> If it fails, it'll output the characters as `?'.

Oh bugger, so gettext has overloaded the meaning of LC_CTYPE to make
it also the indicater for the output character set. From the setlocale
manpage:

LC_CTYPE
for regular expression matching, character classi
fication, conversion, case-sensitive comparison,
and wide character functions.

I would consider this a very nasty bug in gettext

Can you try the (untested) patch below? That tries to work around this
problem by setting and resetting LC_CTYPE in the version comparison
routines.

Wichert.

diff -wur ../dpkg/ChangeLog dpkg/ChangeLog
--- ../dpkg/ChangeLog Mon Dec 31 15:29:56 2001
+++ dpkg/ChangeLog Mon Dec 31 15:48:16 2001
@@ -1,3 +1,10 @@
+Mon Dec 31 15:47:13 CET 2001 Wichert Akkerman <wakk...@debian.org>
+
+ * lib/vercmp.c: Set and restore LC_CTYPE locale
+ * main/main.c, main/query.c, dselect/main.cc: do not force LC_CTYPE to
+ C since that breaks gettext (aargh), but trust the version comparison
+ routines to do the right thing instead.
+
Mon Dec 31 15:25:46 CET 2001 Wichert Akkerman <wakk...@debian.org>

* scripts/dpkg-architecture.pl: fix syntax error
diff -wur ../dpkg/dselect/main.cc dpkg/dselect/main.cc
--- ../dpkg/dselect/main.cc Tue Jul 31 12:34:08 2001
+++ dpkg/dselect/main.cc Mon Dec 31 15:47:02 2001
@@ -470,7 +470,6 @@
char *home, *homerc;

setlocale(LC_ALL, "");
- setlocale(LC_CTYPE, "C");
bindtextdomain(PACKAGE, LOCALEDIR);
textdomain(PACKAGE);

diff -wur ../dpkg/lib/vercmp.c dpkg/lib/vercmp.c
--- ../dpkg/lib/vercmp.c Tue Jul 17 00:10:21 2001
+++ dpkg/lib/vercmp.c Mon Dec 31 15:46:38 2001
@@ -115,19 +115,46 @@
const struct versionrevision *ref,
enum depverrel verrel) {
int r;
- if (verrel == dvr_none) return 1;
+ int ret;
+ char* pl;
+
+ pl=setlocale(LC_CTYPE, "C"); /* TODO: check for errors */
+ if (verrel == dvr_none)
+ ret= 1;
+ else {
r= versioncompare(it,ref);
switch (verrel) {
- case dvr_earlierequal: return r <= 0;
- case dvr_laterequal: return r >= 0;
- case dvr_earlierstrict: return r < 0;
- case dvr_laterstrict: return r > 0;
- case dvr_exact: return r == 0;
- default: internerr("unknown verrel");
+ case dvr_earlierequal:
+ ret= r <= 0;
+ break;
+ case dvr_laterequal:
+ ret= r >= 0;
+ break;
+ case dvr_earlierstrict:
+ ret= r < 0;
+ break;
+ case dvr_laterstrict:
+ ret= r > 0;
+ break;
+ case dvr_exact:
+ ret= r == 0;
+ break;
+ default:
+ internerr("unknown verrel");
+ }
}
- return 0;
+
+ setlocale(LC_CTYPE, pl); /* TODO: check for errors */
+ return ret;
}

int versionsatisfied(struct pkginfoperfile *it, struct deppossi *against) {
- return versionsatisfied3(&it->version,&against->version,against->verrel);
+ int ret;
+ char* pl;
+
+ pl=setlocale(LC_CTYPE, "C"); /* TODO: check for errors */
+ ret= versionsatisfied3(&it->version,&against->version,against->verrel);
+ setlocale(LC_CTYPE, pl); /* TODO: check for errors */
+
+ return ret;
}
diff -wur ../dpkg/main/main.c dpkg/main/main.c
--- ../dpkg/main/main.c Sun Oct 21 23:55:23 2001
+++ dpkg/main/main.c Mon Dec 31 15:46:48 2001
@@ -550,7 +550,6 @@
char *home, *homerc;

setlocale(LC_ALL, "");
- setlocale(LC_CTYPE, "C");
bindtextdomain(PACKAGE, LOCALEDIR);
textdomain(PACKAGE);

diff -wur ../dpkg/main/query.c dpkg/main/query.c
--- ../dpkg/main/query.c Wed Sep 12 16:54:41 2001
+++ dpkg/main/query.c Mon Dec 31 15:47:04 2001
@@ -534,7 +534,6 @@
static void (*actionfunction)(const char *const *argv);

setlocale(LC_ALL, "");
- setlocale(LC_CTYPE, "C");
bindtextdomain(PACKAGE, LOCALEDIR);
textdomain(PACKAGE);

--
_________________________________________________________________
/wic...@wiggy.net This space intentionally left occupied \
| wic...@deephackmode.org http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |

Fumitoshi UKAI

unread,
Dec 31, 2001, 10:06:29 AM12/31/01
to
At Mon, 31 Dec 2001 15:50:09 +0100,
Wichert Akkerman wrote:

> > Because AFAIK gettext() converts message's encoding as locale configurations.
> > If it fails, it'll output the characters as `?'.
>
> Oh bugger, so gettext has overloaded the meaning of LC_CTYPE to make
> it also the indicater for the output character set. From the setlocale
> manpage:
>
> LC_CTYPE
> for regular expression matching, character classi
> fication, conversion, case-sensitive comparison,
> and wide character functions.
>
> I would consider this a very nasty bug in gettext
>
> Can you try the (untested) patch below? That tries to work around this
> problem by setting and resetting LC_CTYPE in the version comparison
> routines.

I'm not untested, but why don't set/restore LC_CTYPE locale
in lib/vercmp.c:ververcmp() ?

versionsatisfied() calls versionsatisfied3(), and versionsatisfied3()
calls versioncompare(). All these calls finally use ververcmp(), so
I think putting set/restore locale in ververcmp() is simple.
Anyway, your patch may break some error message, since internerr()
uses gettext().

Alternatively, how about implement locale unsensible
isdigit() and isalpha() if we uses only US-ASCII charset?
It's not so difficult to implement these, isn't it?

Happy New Year! (from Japan)
Fumitoshi UKAI

Wichert Akkerman

unread,
Dec 31, 2001, 10:12:28 AM12/31/01
to
Previously Fumitoshi UKAI wrote:
> I'm not untested, but why don't set/restore LC_CTYPE locale
> in lib/vercmp.c:ververcmp() ?

Optimization, versinoncompare() can call verrevcmp() twice and we
only set the locale once now. Since verrecvmp() is static we know
nothing else can call it so we are safe now.

> Anyway, your patch may break some error message, since internerr()
> uses gettext().

Ah, bugger. That is easily fixed though. Then again, doing a quick
grep I see there are a bunch more isdigit() calls through the source
that should probably get a similar protection.

> Alternatively, how about implement locale unsensible
> isdigit() and isalpha() if we uses only US-ASCII charset?
> It's not so difficult to implement these, isn't it?

It's quite trivial in fact, but it would be annoying to have to
reimplement such a completely trivial but of libc just to deal
with a design bug in gettext :(

However it seems we have no choice but to do that anyway.. I'll
whip up a patch and commit it to CVS.

Wichert.

--
_________________________________________________________________
/wic...@wiggy.net This space intentionally left occupied \
| wic...@deephackmode.org http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |

0 new messages