[RELEASED] Python 3.1 final

Benjamin Peterson

não lida,

27 de jun. de 2009, 17:12:1027/06/2009

para Python Dev, python-ann...@python.org, pytho...@python.org

On behalf of the Python development team, I'm thrilled to announce the first
production release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced. For example, the new I/O system has been
rewritten in C for speed. File system APIs that use unicode strings now handle
paths with undecodable bytes in them. Other features include an ordered
dictionary implementation, a condensed syntax for nested with statements, and
support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1,
see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python
distribution.

To download Python 3.1 visit:

http://www.python.org/download/releases/3.1/

The 3.1 documentation can be found at:

http://docs.python.org/3.1

Bugs can always be reported to:

http://bugs.python.org

Enjoy!

--
Benjamin Peterson
Release Manager
benjamin at python.org
(on behalf of the entire python-dev team and 3.1's contributors)

Nobody

não lida,

28 de jun. de 2009, 04:58:1428/06/2009

para

On Sat, 27 Jun 2009 16:12:10 -0500, Benjamin Peterson wrote:

> Python 3.1 focuses on the stabilization and optimization of the features and
> changes that Python 3.0 introduced. For example, the new I/O system has been
> rewritten in C for speed. File system APIs that use unicode strings now
> handle paths with undecodable bytes in them.

That's a significant improvement. It still decodes os.environ and sys.argv
before you have a chance to call sys.setfilesystemencoding(), but it
appears to be recoverable (with some effort; I can't find any way to re-do
the encoding without manually replacing the surrogates).

However, sys.std{in,out,err} are still created as text streams, and AFAICT
there's nothing you can do about this from within your code.

All in all, Python 3.x still has a long way to go before it will be
suitable for real-world use.

"Martin v. Löwis"

não lida,

28 de jun. de 2009, 08:36:3728/06/2009

para

> That's a significant improvement. It still decodes os.environ and sys.argv
> before you have a chance to call sys.setfilesystemencoding(), but it
> appears to be recoverable (with some effort; I can't find any way to re-do
> the encoding without manually replacing the surrogates).

See PEP 383.

> However, sys.std{in,out,err} are still created as text streams, and AFAICT
> there's nothing you can do about this from within your code.

That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

Regards,
Martin

P.S. Please identify yourself on this newsgroup.

Benjamin Peterson

não lida,

28 de jun. de 2009, 11:22:1528/06/2009

para pytho...@python.org

Nobody <nobody <at> nowhere.com> writes:
> All in all, Python 3.x still has a long way to go before it will be
> suitable for real-world use.

Such as?

Scott David Daniels

não lida,

28 de jun. de 2009, 11:41:4728/06/2009

para

Nobody wrote:
> On Sat, 27 Jun 2009 16:12:10 -0500, Benjamin Peterson wrote: <announcement of 3.1>
>
> That's a significant improvement....

> All in all, Python 3.x still has a long way to go before it will be
> suitable for real-world use.

Fortunately, I have assiduously avoided the real word, and am happy to
embrace the world from our 'bot overlords.

Congratulations on another release from the hydra-like world of
multi-head development.

--Scott David Daniels
Scott....@Acm.Org

Paul Moore

não lida,

28 de jun. de 2009, 11:45:5128/06/2009

para Martin v. Löwis, pytho...@python.org

2009/6/28 "Martin v. Löwis" <mar...@v.loewis.de>:

>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>> there's nothing you can do about this from within your code.
>
> That's intentional, and not going to change. You can access the
> underlying byte streams if you want to, as you could already in 3.0.

I had a quick look at the documentation, and couldn't see how to do
this. It's the first time I'd read the new IO module documentation, so
I probably missed something obvious. Could you explain how I get the
byte stream underlying sys.stdin? (That should give me enough to find
what I was misunderstanding in the docs).

Thanks,
Paul.

Piet van Oostrum

não lida,

28 de jun. de 2009, 12:09:4728/06/2009

para

>>>>> Paul Moore <p.f....@gmail.com> (PM) wrote:

>PM> 2009/6/28 "Martin v. L�wis" <mar...@v.loewis.de>:

>>>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>>>> there's nothing you can do about this from within your code.
>>>
>>> That's intentional, and not going to change. You can access the
>>> underlying byte streams if you want to, as you could already in 3.0.

>PM> I had a quick look at the documentation, and couldn't see how to do
>PM> this. It's the first time I'd read the new IO module documentation, so
>PM> I probably missed something obvious. Could you explain how I get the
>PM> byte stream underlying sys.stdin? (That should give me enough to find
>PM> what I was misunderstanding in the docs).

http://docs.python.org/3.1/library/sys.html#sys.stdin
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Christian Heimes

não lida,

28 de jun. de 2009, 12:09:5728/06/2009

para Paul Moore, pytho...@python.org, "Martin v. Löwis"

Paul Moore schrieb:

> 2009/6/28 "Martin v. L�wis" <mar...@v.loewis.de>:

>>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>>> there's nothing you can do about this from within your code.
>> That's intentional, and not going to change. You can access the
>> underlying byte streams if you want to, as you could already in 3.0.
>

> I had a quick look at the documentation, and couldn't see how to do

> this. It's the first time I'd read the new IO module documentation, so

> I probably missed something obvious. Could you explain how I get the

> byte stream underlying sys.stdin? (That should give me enough to find

> what I was misunderstanding in the docs).

You've missed the most obvious place to look for the feature -- the
documentation of sys.stdin :)

http://docs.python.org/3.0/library/sys.html#sys.stdin

>>> import sys
>>> sys.stdin
<io.TextIOWrapper object at 0x7f65df915050>
>>> sys.stdin.buffer
<io.BufferedReader object at 0x7f65df90bdd0>
>>> sys.stdin.read(1)

'\n'
>>> sys.stdin.buffer.read(1)

b'\n'

Christian

Christian Heimes

não lida,

28 de jun. de 2009, 12:09:5728/06/2009

para pytho...@python.org, pytho...@python.org, "Martin v. Löwis"

Paul Moore schrieb:
> 2009/6/28 "Martin v. L�wis" <mar...@v.loewis.de>:

>>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>>> there's nothing you can do about this from within your code.
>> That's intentional, and not going to change. You can access the
>> underlying byte streams if you want to, as you could already in 3.0.
>

Paul Moore

não lida,

28 de jun. de 2009, 12:18:3728/06/2009

para Christian Heimes, pytho...@python.org, Martin v. Löwis

2009/6/28 Christian Heimes <li...@cheimes.de>:
> Paul Moore schrieb:

>> I had a quick look at the documentation, and couldn't see how to do
>> this. It's the first time I'd read the new IO module documentation, so
>> I probably missed something obvious. Could you explain how I get the
>> byte stream underlying sys.stdin? (That should give me enough to find
>> what I was misunderstanding in the docs).
>
> You've missed the most obvious place to look for the feature -- the
> documentation of sys.stdin :)
>
> http://docs.python.org/3.0/library/sys.html#sys.stdin
>
>>>> import sys
>>>> sys.stdin
> <io.TextIOWrapper object at 0x7f65df915050>
>>>> sys.stdin.buffer
> <io.BufferedReader object at 0x7f65df90bdd0>
>>>> sys.stdin.read(1)
>
> '\n'
>>>> sys.stdin.buffer.read(1)

Thanks. Like you say, the obvious place I didn't think of... :-) (I'd
have experimented, but this PC doesn't have Python 3 installed at the
moment :-()

The "buffer" attribute doesn't seem to be documented in the docs for
the io module. I'm guessing that the TextIOBase class should have a
note that you get at the buffer through the "buffer" attribute?

Paul.

Nobody

não lida,

28 de jun. de 2009, 12:27:5228/06/2009

para

Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.

Benjamin Peterson

não lida,

28 de jun. de 2009, 13:24:1128/06/2009

para pytho...@python.org

Nobody <nobody <at> nowhere.com> writes:
>
> Such as not trying to shoe-horn every byte string it encounters into
> Unicode. Some of them really are *just* byte strings.

You're certainly allowed to convert them back to byte strings if you want.

Terry Reedy

não lida,

28 de jun. de 2009, 13:31:5028/06/2009

para pytho...@python.org

Let's ignore the disinformation. So false it is hardly worth refuting.

Benjamin Peterson

não lida,

28 de jun. de 2009, 13:34:3428/06/2009

para pytho...@python.org

Paul Moore <p.f.moore <at> gmail.com> writes:

> The "buffer" attribute doesn't seem to be documented in the docs for
> the io module. I'm guessing that the TextIOBase class should have a
> note that you get at the buffer through the "buffer" attribute?

Good point. I've now documented it, and the "raw" attribute of BufferedIOBase.

Aahz

não lida,

28 de jun. de 2009, 14:43:1928/06/2009

para

In article <mailman.2254.1246209...@python.org>,

Yes, but do you get back the original byte strings? Maybe I'm missing
something, but my impression is that this is still an issue for the email
module as well as command-line arguments and environment variables.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha

Benjamin Peterson

não lida,

28 de jun. de 2009, 15:21:4928/06/2009

para pytho...@python.org

Aahz <aahz <at> pythoncraft.com> writes:
> Yes, but do you get back the original byte strings? Maybe I'm missing
> something, but my impression is that this is still an issue for the email
> module as well as command-line arguments and environment variables.

The email module is, yes, broken. You can recover the bytestrings of

Nobody

não lida,

28 de jun. de 2009, 16:54:3428/06/2009

para

1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

Most of the issues can be worked around by calling
sys.setfilesystemencoding('iso-8859-1') at the start of the program, but
sys.argv and os.environ have already been converted by this point.

Nobody

não lida,

28 de jun. de 2009, 17:01:3428/06/2009

para

On Sun, 28 Jun 2009 13:31:50 -0400, Terry Reedy wrote:

>>> Nobody <nobody <at> nowhere.com> writes:
>>>> All in all, Python 3.x still has a long way to go before it will be
>>>> suitable for real-world use.
>>> Such as?
>>
>> Such as not trying to shoe-horn every byte string it encounters into
>> Unicode. Some of them really are *just* byte strings.
>
> Let's ignore the disinformation.

Translation: let's ignore anything which falsifies the assumptions.

> So false it is hardly worth refuting.

Your copy of Trolling by Numbers must be getting pretty dog-eared by now.

Benjamin Peterson

não lida,

28 de jun. de 2009, 17:25:1328/06/2009

para pytho...@python.org

Nobody <nobody <at> nowhere.com> writes:

>
> On Sun, 28 Jun 2009 19:21:49 +0000, Benjamin Peterson wrote:
>
> >> Yes, but do you get back the original byte strings? Maybe I'm missing
> >> something, but my impression is that this is still an issue for the email
> >> module as well as command-line arguments and environment variables.
> >
> > The email module is, yes, broken. You can recover the bytestrings of
> > command-line arguments and environment variables.
>
> 1. Does Python offer any assistance in doing so, or do you have to
> manually convert the surrogates which are generated for unrecognised bytes?

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

>
> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

What's a non-invertible encoding? I can't find a reference to the term.

Hallvard B Furuseth

não lida,

28 de jun. de 2009, 17:34:1028/06/2009

para

Different ISO-2022 strings can map to the same Unicode string.
Thus you can convert back to _some_ ISO-2022 string, but it won't
necessarily match the original.

--
Hallvard

"Martin v. Löwis"

não lida,

28 de jun. de 2009, 17:50:4428/06/2009

para

> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

ISO-2022 cannot be used as a system encoding.

Please do read the responses I write, and please do identify yourself.

Regards,
Martin

Gerhard Häring

não lida,

28 de jun. de 2009, 18:23:0428/06/2009

para pytho...@python.org

+1 QOTW

-- Gerhard

Nobody

não lida,

29 de jun. de 2009, 06:33:5529/06/2009

para

On Sun, 28 Jun 2009 21:25:13 +0000, Benjamin Peterson wrote:

>> > The email module is, yes, broken. You can recover the bytestrings of
>> > command-line arguments and environment variables.
>>
>> 1. Does Python offer any assistance in doing so, or do you have to
>> manually convert the surrogates which are generated for unrecognised bytes?
>
> fs_encoding = sys.getfilesystemencoding()
> bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

This results in an internal error:

> "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

[FWIW, the error corresponds to _PyBytes_Resize, which has a
cautionary comment almost as large as the code.]

The documentation gives the impression that "surrogateescape" is only
meaningful for decoding.

>> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)?
>
> What's a non-invertible encoding? I can't find a reference to the term.

One where different inputs can produce the same output.

Nobody

não lida,

29 de jun. de 2009, 07:02:2029/06/2009

para

On Sun, 28 Jun 2009 14:36:37 +0200, Martin v. L�wis wrote:

>> That's a significant improvement. It still decodes os.environ and sys.argv
>> before you have a chance to call sys.setfilesystemencoding(), but it
>> appears to be recoverable (with some effort; I can't find any way to re-do
>> the encoding without manually replacing the surrogates).
>
> See PEP 383.

Okay, that's useful, except that it may have some bugs:

> r = "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

Trying a few random test cases suggests that the ratio of valid to invalid
bytes has an effect. Strings which consist mostly of invalid bytes trigger
the error, those which are mostly valid don't.

The error corresponds to _PyBytes_Resize(), which has the following
words of caution in a preceding comment:

/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/

Assuming that this gets fixed, it should make most of the problems with
3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
sys.argv_bytes and os.environ_bytes.

>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>> there's nothing you can do about this from within your code.
>
> That's intentional, and not going to change. You can access the
> underlying byte streams if you want to, as you could already in 3.0.

Okay, I've since been pointed to the relevant information (I was looking
under "File Objects"; I didn't think to look at "sys").

Antoine Pitrou

não lida,

29 de jun. de 2009, 07:41:1129/06/2009

para pytho...@python.org

Nobody <nobody <at> nowhere.com> writes:
>

> This results in an internal error:
>
> > "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> SystemError: Objects/bytesobject.c:3182: bad argument to internal function

Please report a bug on http://bugs.python.org

As for a bytes version of sys.argv and os.environ, you're welcome to propose a
patch (this would be a separate issue on the aforementioned issue tracker).

Thanks

Antoine.

Hallvard B Furuseth

não lida,

29 de jun. de 2009, 07:57:4929/06/2009

para

Nobody <nob...@nowhere.com> writes:
>On Sun, 28 Jun 2009 14:36:37 +0200, Martin v. L�wis wrote:
>> See PEP 383.
>
> Okay, that's useful, except that it may have some bugs:

> (...)

> Assuming that this gets fixed, it should make most of the problems with
> 3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
> sys.argv_bytes and os.environ_bytes.

That's hopeless to keep track of across modules if something modifies
sys.argv or os.environ.

If the current scheme for recovering the original bytes proves
insufficient, what could work is a string type which can have an
attribute with the original bytes (if the source was bytes). And/or
sys.argv and os.environ maintaining the correspondence when feasible.

Anyway, I haven't looked at whether any of this is a problem, so don't
mind me:-) As long as it's definitely possible to tell python once
and for all not to apply locales and string conversions, instead of
having to keep track of an ever-expanding list of variables to tame
it's bytes->character conversions (as happened with Emacs).

--
Hallvard

Paul Moore

não lida,

29 de jun. de 2009, 08:05:5129/06/2009

para Antoine Pitrou, pytho...@python.org

2009/6/29 Antoine Pitrou <soli...@pitrou.net>:

> As for a bytes version of sys.argv and os.environ, you're welcome to propose a
> patch (this would be a separate issue on the aforementioned issue tracker).

But please be aware that such a proposal would have to consider:

1. That on Windows, the native form is the character version, and the
bytes version would have to address all the same sorts of encoding
issues that the OP is complaining about in the character versions. [1]

2. That the proposal address the question of how to write portable,
robust, code (given that choosing argv vs argv_bytes based on
sys.platform is unlikely to count as a good option...)

3. Why defining your own argv_bytes as argv_bytes =
[a.encode("iso-8859-1", "surrogateescape") for a in sys.argv] is
insufficient (excluding issues with bugs, which will be fixed
regardless) for the occasional cases where it's needed.

Before writing the proposal, the OP should probably review the
extensive discussions which can be found in the python-dev archives.
It would be wrong for people reading this thread to think that the
implemented approach is in any sense a "quick fix" - it's certainly a
compromise (and no-one likes all aspects of any compromise!) but it's
one made after a lot of input from people with widely differing
requirements.

Paul.

[1] And my understanding, from the PEP, is that even on POSIX, the
argv and environ data is intended to be character data, even though
the native C APIs expose a byte-oriented interface. So conceptually,
character format is "correct" on POSIX as well... (But I don't write
code for POSIX systems, so I'll leave it to the POSIX users to debate
this point further).

Nobody

não lida,

29 de jun. de 2009, 11:16:3229/06/2009

para

On Mon, 29 Jun 2009 13:57:49 +0200, Hallvard B Furuseth wrote:

>> Okay, that's useful, except that it may have some bugs:
>> (...)
>> Assuming that this gets fixed, it should make most of the problems with
>> 3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
>> sys.argv_bytes and os.environ_bytes.
>
> That's hopeless to keep track of across modules if something modifies
> sys.argv or os.environ.

Oh, I wasn't suggesting that they should be updated. Just that there
should be some way to get at the original data.

The mechanism used in 3.1 is sufficient. I'm mostly concerned that it's
*possible* to recover the data; convenience is of secondary importance.

Calling sys.setfilesystemencoding('iso-8859-1') right at the start of the
code eliminates most of the issues. It's just the stuff which happens
before the first line of code is executed (sys.argv, os.environ, sys.stdin
etc) which was problematic.

[BTW, it isn't just Python that has problems. The directory where I was
performing tests happened to be an svn checkout. A subsequent "svn update"
promptly crapped out because I'd left behind a file whose name wasn't
valid ASCII.]

Nobody

não lida,

29 de jun. de 2009, 11:35:4629/06/2009

para

On Mon, 29 Jun 2009 11:41:11 +0000, Antoine Pitrou wrote:

> Nobody <nobody <at> nowhere.com> writes:
>>
>> This results in an internal error:
>>
>> > "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> SystemError: Objects/bytesobject.c:3182: bad argument to internal function
>
> Please report a bug on http://bugs.python.org

Done.

> As for a bytes version of sys.argv and os.environ, you're welcome to propose a
> patch (this would be a separate issue on the aforementioned issue tracker).

Assuming that the above bug gets fixed, it isn't really necessary. In
particular, maintaining bytes/string versions in the presence of updates
is likely to be more trouble than it's worth.

Nobody

não lida,

29 de jun. de 2009, 12:17:1729/06/2009

para

On Mon, 29 Jun 2009 13:05:51 +0100, Paul Moore wrote:

>> As for a bytes version of sys.argv and os.environ, you're welcome to
>> propose a patch (this would be a separate issue on the aforementioned
>> issue tracker).
>
> But please be aware that such a proposal would have to consider:
>
> 1. That on Windows, the native form is the character version, and the
> bytes version would have to address all the same sorts of encoding
> issues that the OP is complaining about in the character versions. [1]

A bytes version doesn't make sense on Windows (at least, not on the
NT-based versions, and the DOS-based branch isn't worth bothering about,
IMHO).

Also, Windows *needs* to deal with characters due to the
fact that filenames, environment variables, etc are case-insensitive.

> 2. That the proposal address the question of how to write portable,
> robust, code (given that choosing argv vs argv_bytes based on
> sys.platform is unlikely to count as a good option...)

There is a tension here between robustness and portability. In my
situation, robustness means getting the "unadulterated" data. I can always
adulterate it myself if I need to.

> 3. Why defining your own argv_bytes as argv_bytes =
> [a.encode("iso-8859-1", "surrogateescape") for a in sys.argv] is
> insufficient (excluding issues with bugs, which will be fixed
> regardless) for the occasional cases where it's needed.

Other than the bug, it appears to be sufficient. I don't need to support
a locale where nl_langinfo(CODESET) is ISO-2022 (I *do* need to support
lossless round-trip of ISO-2022 filenames, possibly stored in argv and
maybe even in environ, but that's a different matter; the code only
really needs to run with LANG=C).

> [1] And my understanding, from the PEP, is that even on POSIX, the
> argv and environ data is intended to be character data, even though
> the native C APIs expose a byte-oriented interface. So conceptually,
> character format is "correct" on POSIX as well... (But I don't write
> code for POSIX systems, so I'll leave it to the POSIX users to debate
> this point further).

Even if it's "intended" to be character data, it isn't *required* to be.
In particular, it's not required to be in the locale's encoding.

A common example of what I need to handle is:

find /www ... -print0 | xargs -0 myscript

where the filenames can be in a wide variety of different encodings
(sometimes even within a single directory).