Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

string module

0 views
Skip to first unread message

Phil Hystad

unread,
May 19, 2002, 8:43:16 PM5/19/02
to
I have python on linux (suse) and the string.uppercase and string.lowercase
values are a bit strange.

For example, string.uppercase has the following values:

ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde


And, lowercase has the same problem....plus, the string.letters has the
values concatenated and there are two sets of these escaped hex codes.

What is this all about? I was expected just the 26 letters. The length is
not correct either:

>>>len(string.uppercase)
56
>>>

So, it reports a length of 56 instead of 26.

I checked locale and it is correct for me (english).

Is this a garbled module or a feature?

phil

marduk

unread,
May 19, 2002, 9:23:28 PM5/19/02
to
On Sun, 2002-05-19 at 19:43, Phil Hystad wrote:
> I have python on linux (suse) and the string.uppercase and string.lowercase
> values are a bit strange.
>
> For example, string.uppercase has the following values:
>
> ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde
>

[...]

> Is this a garbled module or a feature?

It's Unicode.

--m


Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Phil Hystad

unread,
May 19, 2002, 10:34:33 PM5/19/02
to

"marduk" <mar...@python.net> wrote in message
news:1021857810.1956.15.camel@berkowitz...

> On Sun, 2002-05-19 at 19:43, Phil Hystad wrote:
> > I have python on linux (suse) and the string.uppercase and
string.lowercase
> > values are a bit strange.
> >
> > For example, string.uppercase has the following values:
> >
> > ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde
> >
>
> [...]
>
> > Is this a garbled module or a feature?
>
> It's Unicode.
>
> --m
>

Unicode?

Certainly it is not Unicode, these are characters, they are not 16-bit
elements of Unicode.


Fernando Pérez

unread,
May 20, 2002, 1:34:08 AM5/20/02
to
Phil Hystad wrote:

> For example, string.uppercase has the following values:
>
> ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde
>

>>> import string
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde'
>>> print string.uppercase
ABCDEFGHIJKLMNOPQRSTUVWXYZ懒旅呐魄壬仕掏蜗醒矣哉重仝圮蒉

The difference between repr() and str().

cheers,

f.

Fernando Pérez

unread,
May 20, 2002, 1:51:08 AM5/20/02
to
Fernando Pérez wrote:

>> For example, string.uppercase has the following values:
>>
>> ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde
>>
>
>>>> import string
>>>> string.uppercase
>
'ABCDEFGHIJKLMNOPQRSTUVWXYZ\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde'
>>>> print string.uppercase

> ABCDEFGHIJKLMNOPQRSTUVWXYZہءآأؤإئابةتثجحخدذرزسشصضطظعغـفق
>

But now I'm totally confused. The above works at a python prompt, but the
simple program

import string
print string.__file__
print string.uppercase

executed at the system prompt gives:

[~]> python t.py
/usr/lib/python2.2/string.pyc
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Huh??? How can the python shell get a different value? I checked in the
interactive prompt and string.__file__ does point to the same file as
indicated above. So how in the world do I get the extra accented chars shoved
into string.uppercase? The relevant section in /usr/lib/python2.2/string.py:

# Some strings for ctype-style character classification
whitespace = ' \t\n\r\v\f'
lowercase = 'abcdefghijklmnopqrstuvwxyz'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
letters = lowercase + uppercase

I'd very much appreciate someone who could explain who is putting the extra
chars into these module constants when at the python prompt.

Cheers,

f.

Tim Peters

unread,
May 20, 2002, 2:13:14 AM5/20/02
to
[Phil Hystad]

> For example, string.uppercase has the following values:
>
> ABCDEFGHIJKLMNOPQRSTUVWXYZ\xC0\xC1\xC2....\xCde

Here's the C code that creates this:

n = 0;
for (c = 0; c < 256; c++) {
if (isupper(c))
buf[n++] = c;
}

So you get whatever your platform C's isupper() function says is an
uppercase letter; this may vary according to locale and platform C bugs.

If you're married to ASCII, use string.ascii_lowercase,
string.ascii_uppercase, and string.ascii_letters instead.

Fernando Pérez

unread,
May 20, 2002, 4:03:33 PM5/20/02
to
Fredrik Lundh wrote:

>> Why the difference?
>
> the readline library might be messing up the locale.

Indeed! Thanks for the information, I was really puzzled. Here's what happens:

[~/test]> python
Python 2.2 (#1, Feb 24 2002, 16:21:58)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)] on linux-i386
Type "help", "copyright", "credits" or "license" for more information.
>>> import string,locale
>>> print string.__file__
/usr/lib/python2.2/string.pyc
>>> print locale.getlocale()
['en_US', 'ISO8859-1']
>>> print string.uppercase
ABCDEFGHIJKLMNOPQRSTUVWXYZ懒旅呐魄壬仕掏蜗醒矣哉重仝圮蒉

The strings.py file contains the same above statements:

[~/test]> python strings.py
/usr/lib/python2.2/string.pyc
(None, None)
ABCDEFGHIJKLMNOPQRSTUVWXYZ


So indeed, since I'm loading readline at the interactive prompt, there's an
under the hood resetting of the locale. Ah, the beauty of perl-like silent,
implicit global changes ;)

Thanks for the clarification!

f.

Michael Hudson

unread,
May 21, 2002, 6:48:43 AM5/21/02
to
Fernando Pérez <fper...@yahoo.com> writes:

> Fredrik Lundh wrote:
>
> >> Why the difference?
> >
> > the readline library might be messing up the locale.
>
> Indeed! Thanks for the information, I was really puzzled. Here's
> what happens:

[...]

> So indeed, since I'm loading readline at the interactive prompt, there's an
> under the hood resetting of the locale. Ah, the beauty of perl-like silent,
> implicit global changes ;)

Now trying playing with signals at the interactive prompt[1]
<wink/frown>. I wouldn't want to live without readline, but sometimes
I do wish it wouldn't bugger around with stuff my application cares
about so much.

Cheers,
M.

[1] As in:
http://mail.python.org/pipermail/python-dev/2001-November/018420.html

--
MARVIN: Oh dear, I think you'll find reality's on the blink again.
-- The Hitch-Hikers Guide to the Galaxy, Episode 12

Fernando Pérez

unread,
May 21, 2002, 12:13:02 PM5/21/02
to
Michael Hudson wrote:

>> So indeed, since I'm loading readline at the interactive prompt, there's an
>> under the hood resetting of the locale. Ah, the beauty of perl-like silent,
>> implicit global changes ;)
>
> Now trying playing with signals at the interactive prompt[1]
> <wink/frown>. I wouldn't want to live without readline, but sometimes
> I do wish it wouldn't bugger around with stuff my application cares
> about so much.
>

Agreed. There's not much I can do about readline (I'm not going to rewrite the
whole thing!) but I did rewrite a lot of rlcompleter and other things which
import readline (like pdb) just so they wouldn't muck around globally so
much! In that sense rlcompleter is very poorly designed: there should be an
explicit need to call a global initializer so that submodules which import it
don't damage your global readline namespace handling (like pdb does).

Oh well, one of these days I'll have to wrap all these things as patches and
send them in.

f.

Michael Hudson

unread,
May 21, 2002, 1:26:41 PM5/21/02
to
Fernando Pérez <fper...@yahoo.com> writes:

> Michael Hudson wrote:
>
> >> So indeed, since I'm loading readline at the interactive prompt, there's an
> >> under the hood resetting of the locale. Ah, the beauty of perl-like silent,
> >> implicit global changes ;)
> >
> > Now trying playing with signals at the interactive prompt[1]
> > <wink/frown>. I wouldn't want to live without readline, but sometimes
> > I do wish it wouldn't bugger around with stuff my application cares
> > about so much.
> >
>
> Agreed. There's not much I can do about readline (I'm not going to
> rewrite the whole thing!)

Well, as you probably know, that's what *I* did...

pyrepl hasn't seen much of my time of late, sadly.

Cheers,
M.

--
The only problem with Microsoft is they just have no taste.
-- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
and quoted by Aahz Maruch on comp.lang.python

holger krekel

unread,
May 21, 2002, 4:25:47 PM5/21/02
to
Michael Hudson wrote:
> Fernando Pérez <fper...@yahoo.com> writes:
>
> > Michael Hudson wrote:
> >
> > >> So indeed, since I'm loading readline at the interactive prompt, there's an
> > >> under the hood resetting of the locale. Ah, the beauty of perl-like silent,
> > >> implicit global changes ;)
> > >
> > > Now trying playing with signals at the interactive prompt[1]
> > > <wink/frown>. I wouldn't want to live without readline, but sometimes
> > > I do wish it wouldn't bugger around with stuff my application cares
> > > about so much.
> > >
> >
> > Agreed. There's not much I can do about readline (I'm not going to
> > rewrite the whole thing!)
>
> Well, as you probably know, that's what *I* did...
>
> pyrepl hasn't seen much of my time of late, sadly.

i just gave it a test. very interesting! it'a pity i am always
using vi-bindings (even in xemacs :-).

btw, Fernando already knows that i rewrote the rlcompleter module
to be a lot more comfortable. It works very differently from
the old rlcompleter by tokeninzing/parsing/evaluating subexpressions.
My current development version (requires readline :-) is here:

http://home.trillke.net/~hpk/rlcompleter2.py

just import it on a pure python installation (with readline available)
and play around :-)

If you want to continue development i'd definitely try to integrate my
rlcompleter2 module into pyrepl (and learn some emacs-bindings again :-).

cheers,

holger


Michael Hudson

unread,
May 22, 2002, 5:52:20 AM5/22/02
to
holger krekel <py...@devel.trillke.net> writes:

> Michael Hudson wrote:
> > Fernando Pérez <fper...@yahoo.com> writes:

[readline gripes]

> > > Agreed. There's not much I can do about readline (I'm not going to
> > > rewrite the whole thing!)
> >
> > Well, as you probably know, that's what *I* did...
> >
> > pyrepl hasn't seen much of my time of late, sadly.
>
> i just gave it a test. very interesting! it'a pity i am always
> using vi-bindings (even in xemacs :-).

Hey, do you want to write vi-bindings for pyrepl? It should be
possible (it was meant to be, any way). I'm addicted to emacs-mode
bindings so it's hard for me to tell if I'm getting it right (though I
am getting better at using vi as an editor).

> btw, Fernando already knows that i rewrote the rlcompleter module
> to be a lot more comfortable. It works very differently from
> the old rlcompleter by tokeninzing/parsing/evaluating subexpressions.
> My current development version (requires readline :-) is here:
>
> http://home.trillke.net/~hpk/rlcompleter2.py
>
> just import it on a pure python installation (with readline available)
> and play around :-)

Wow, that's a peice of work!

> If you want to continue development i'd definitely try to integrate my
> rlcompleter2 module into pyrepl (and learn some emacs-bindings again :-).

I would imagine that it would be *much* easier to do the kind of
things you've done in rlcompleter2 for pyrepl, owing to it being
"Python all the way down". Though you might need to worry about
multiple lines and stuff.

Hmm, so many things to worry about, so little time...

Cheers,
M.

--
I sense much distrust in you. Distrust leads to cynicism, cynicism
leads to bitterness, bitterness leads to the Awareness Of True
Reality which is referred to by those-who-lack-enlightenment as
"paranoia". I approve. -- David P. Murphy, alt.sysadmin.recovery

holger krekel

unread,
May 22, 2002, 6:55:22 AM5/22/02
to
Michael Hudson wrote:
> > btw, Fernando already knows that i rewrote the rlcompleter module
> > to be a lot more comfortable. It works very differently from
> > the old rlcompleter by tokeninzing/parsing/evaluating subexpressions.
> > My current development version (requires readline :-) is here:
> >
> > http://home.trillke.net/~hpk/rlcompleter2.py
> >
> > just import it on a pure python installation (with readline available)
> > and play around :-)
>
> Wow, that's a peice of work!

thanks! it's even better if you apply the one-liner readline-patch 558432 on
sourceforge :-)



> > If you want to continue development i'd definitely try to integrate my
> > rlcompleter2 module into pyrepl (and learn some emacs-bindings again :-).
>
> I would imagine that it would be *much* easier to do the kind of
> things you've done in rlcompleter2 for pyrepl, owing to it being
> "Python all the way down".

right. that's why i like your approach and code.

> Though you might need to worry about
> multiple lines and stuff.

yes, this is actually a *non-trivial* problem. I have some
ideas how to solve this for some cases, though.

Solving these issues would yield the basis for a truly
interactive development style.

> Hmm, so many things to worry about, so little time...

true. but as long as the things i worry about are interesting
i don't care too much :-)

cheers,

holger


Fernando Pérez

unread,
May 22, 2002, 12:48:23 PM5/22/02
to
holger krekel wrote:

>> > My current development version (requires readline :-) is here:
>> >
>> > http://home.trillke.net/~hpk/rlcompleter2.py
>> >
>> > just import it on a pure python installation (with readline available)
>> > and play around :-)
>>
>> Wow, that's a peice of work!
>
> thanks! it's even better if you apply the one-liner readline-patch 558432 on
> sourceforge :-)

By the way Holger, have you submitted this patch to the main python-dev
people? I _hate_ the normal readline behavior which forces me to backspace
every time, I just didn't know fixing it was this simple. But C things that
don't become part of the standard distro get ignored by 99% of people, since
that is the 99% which doesn't build its own python. For pure python modules
it's easier to distribute a replacement, but sine this is a C patch I hope it
becomes part of the standard distribution.

I'll test the rlcompleter2/ipython merge as soon as I can.

Cheers,
f

holger krekel

unread,
May 22, 2002, 1:27:53 PM5/22/02
to
Fernando P?rez wrote:
> holger krekel wrote:
>
> >> > My current development version (requires readline :-) is here:
> >> >
> >> > http://home.trillke.net/~hpk/rlcompleter2.py
> >> >
> >> > just import it on a pure python installation (with readline available)
> >> > and play around :-)
> >>
> >> Wow, that's a peice of work!
> >
> > thanks! it's even better if you apply the one-liner readline-patch 558432 on
> > sourceforge :-)
>
> By the way Holger, have you submitted this patch to the main python-dev
> people?

the given number *is* the patch number on sourceforge. I don't know whether
i should assign it to somebody or just wait?

> I _hate_ the normal readline behavior which forces me to backspace
> every time, I just didn't know fixing it was this simple.

I don't think that *anybody* loves this 'space'.
It also contradicts the coding-style as described in PEP 8.

> But C things that don't become part of the standard distro get ignored by 99% of people, since
> that is the 99% which doesn't build its own python. For pure python modules
> it's easier to distribute a replacement, but sine this is a C patch I hope it
> becomes part of the standard distribution.

hope so too.



> I'll test the rlcompleter2/ipython merge as soon as I can.

We probably have to discuss the filename-completion thingie a bit...

cheers,

holger


0 new messages