Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"More About Unicode in Python 2 and 3"

168 views
Skip to first unread message

Mark Lawrence

unread,
Jan 5, 2014, 8:14:24 AM1/5/14
to pytho...@python.org
http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/

Please don't shoot the messenger :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Ned Batchelder

unread,
Jan 5, 2014, 8:22:38 AM1/5/14
to pytho...@python.org
On 1/5/14 8:14 AM, Mark Lawrence wrote:
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/
>
> Please don't shoot the messenger :)
>

With all of the talk about py 2 vs. py3 these days, this is the blog
post that I think deserves the most real attention. I haven't had to do
the kind of coding that Armin is talking about, but I've heard more than
one person talk about the difficulty of it in Python 3.

If anyone wants Python 3 uptake improved, the best thing would be to
either explain to Armin how he missed the easy way to do what he wants
(seems unlikely), or advocate to the core devs why they should change
things to improve this situation.

--
Ned Batchelder, http://nedbatchelder.com

Chris Angelico

unread,
Jan 5, 2014, 8:34:50 AM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 12:14 AM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/
>
> Please don't shoot the messenger :)

Most of that is tiresome reiterations of the same arguments ("It
worked fine, there were just a few problems" - which means that you
haven't thought through text vs bytes properly; the switch to Py3
highlights a problem that was already there, which means that Py3
showed up what was already a problem - sounds a bit like Romans 7 to
me), plus complaints that have been heard elsewhere, like the
encode/decode methods and the removal of codecs that aren't
str<->bytes. (Don't know if that one will ever be resolved, but it's
not enough to say that Python 3 "got it wrong". As we've seen from
3.3, there has been a progressive improvement in compatibility between
Py2 and Py3. Maybe 3.5 will recreate some of these things people are
moaning about the lack of, which would then prove that the Py3 model
isn't fundamentally flawed by their loss. Anyhow.)

But this bit looks odd:

"""
For instance passing a urllib request object to Flask's JSON parse
function breaks on Python 3 but works on Python 2 as a result of this:

>>> from urllib.request import urlopen
>>> r = urlopen('https://pypi.python.org/pypi/Flask/json')
>>> from flask import json
>>> json.load(r)
Traceback (most recent call last):
File "decoder.py", line 368, in raw_decode
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: No JSON object could be decoded
"""

Why is a StopIteration bubbling up? (I don't have Flask, so I can't
verify this.) Is it as simple as "this should be raising from None",
or is there something else going on?

ChrisA

Chris Angelico

unread,
Jan 5, 2014, 8:55:03 AM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 12:22 AM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> If anyone wants Python 3 uptake improved, the best thing would be to either
> explain to Armin how he missed the easy way to do what he wants (seems
> unlikely), or advocate to the core devs why they should change things to
> improve this situation.

I'm not sure that there is an "easy way". See, here's the deal. If all
your data is ASCII, you can shut your eyes to the difference between
bytes and text and Python 2 will work perfectly for you. Then some day
you'll get a non-ASCII character come up (or maybe you'll get all of
Latin-1 "for free" and it's when you get a non-Latin-1 character -
same difference), and you start throwing in encode() and decode()
calls in places. But you feel like you're fixing little problems with
little solutions, so it's no big deal.

Making the switch to Python 3 forces you to distinguish bytes from
text, even when that text is all ASCII. Suddenly that's a huge job, a
huge change through all your code, and it's all because of this switch
to Python 3. The fact that you then get the entire Unicode range "for
free" doesn't comfort people who are dealing with URLs and are
confident they'll never see anything else (if they *do* see anything
else, it's a bug at the far end). Maybe it's the better way, but like
trying to get people to switch from MS Word onto an open system, it's
far easier to push for Open Office than for LaTeX. Getting your head
around a whole new way of thinking about your data is work, and people
want to be lazy. (That's not a bad thing, by the way. Laziness means
schedules get met.)

So what can be done about it? Would it be useful to have a type that
represents an ASCII string? (Either 'bytes' or something else, it
doesn't matter what.) I'm inclined to say no, because as of the
current versions, encoding/decoding UTF-8 has (if I understand
correctly) been extremely optimized in the specific case of an
all-ASCII string; so the complaint that there's no "string formatting
for bytes" could be resolved by simply decoding to str, then encoding
to bytes. I'd look on that as having two costs, a run-time performance
cost and a code readability cost, and then look at reducing each of
them - but without blurring the bytes/text distinction. Yes, that
distinction is a cost. It's like any other mental cost, and it just
has to be paid. The only way to explain it is that Py2 has the "cost
gap" between ASCII (or Latin-1) and the rest of Unicode, but Py3 puts
that cost gap before ASCII, and then gives you all of Unicode for the
same low price (just $19.99 a month, you won't even notice the
payments!).

Question, to people who have large Py2 codebases that manipulate
mostly-ASCII text. How bad would it be to your code to do this:

# Py2: build a URL
url = "http://my.server.name/%s/%s" % (path, fn)

# Py3: build a URL as bytes
def B(s):
if isinstance(s, str): return s.encode()
return s.decode()

url = B(B(b"http://my.server.name/%s/%s") % (path, fn))

? This little utility function lets you do the formatting as text
(let's assume the URL pattern comes from somewhere else, or you'd just
strip off the b'' prefix), while still mostly working with bytes. Is
it an unacceptable level of code clutter?

ChrisA

Antoine Pitrou

unread,
Jan 5, 2014, 9:37:17 AM1/5/14
to pytho...@python.org
On Sun, 05 Jan 2014 08:22:38 -0500
Ned Batchelder <n...@nedbatchelder.com> wrote:
> On 1/5/14 8:14 AM, Mark Lawrence wrote:
> > http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/
> >
> > Please don't shoot the messenger :)
> >
>
> With all of the talk about py 2 vs. py3 these days, this is the blog
> post that I think deserves the most real attention. I haven't had to do
> the kind of coding that Armin is talking about, but I've heard more than
> one person talk about the difficulty of it in Python 3.
>
> If anyone wants Python 3 uptake improved, the best thing would be to
> either explain to Armin how he missed the easy way to do what he wants
> (seems unlikely), or advocate to the core devs why they should change
> things to improve this situation.

Sometimes the best way to "advocate to the core devs" is to do part of
the work, though.

There are several people arguing for %-formatting or .format() on
bytes, but that still lacks a clear description of which formatting
codes would be supported, with which semantics.
(see e.g. http://bugs.python.org/issue3982)

As for the rest of Armin's rant, well, it's a rant. "In some cases
Python 3 is a bit less practical than Python 2" doesn't equate to
"Python 3 is broken and 2.8 should be released instead".

Regards

Antoine.


Terry Reedy

unread,
Jan 5, 2014, 4:10:01 PM1/5/14
to pytho...@python.org
On 1/5/2014 8:14 AM, Mark Lawrence wrote:
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/

I disagree with the following claims:

"Looking at that you can see that Python 3 removed something: support
for non Unicode data text. "

I believe 2.7 str text methods like .upper only supported ascii. General
non-unicode bytes text support would require an encoding as an attribute
of the bytes text object. Python never had that.

"Python 3 essentially removed the byte-string type which in 2.x was
called str."

Python 3 renamed unicode as str and str as bytes. Bytes have essentially
all the text methods of 2.7 str. Compare dir(str) in 2.7 and dir(bytes)
in 3.x. The main change of the class itself is that indexing and
iteration yield ints i, 0 <= i < 256.

"all text operations now are only defined for Unicode strings."

?? Text methods are still defined on (ascii) bytes. It is true that one
text operation -- string formatting no longer is (and there is an issue
about that). But one is not all. There is also still discussion about
within-class transforms, but they are still possible, even if not with
the codecs module.

I suspect there are other basic errors, but I mostly quit reading at
this point.

--
Terry Jan Reedy

Terry Reedy

unread,
Jan 5, 2014, 4:17:25 PM1/5/14
to pytho...@python.org
On 1/5/2014 8:14 AM, Mark Lawrence wrote:
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/

I meant to mention in my previous reply that Armin authored PEP 414,
Explicit Unicode Literal for Python 3.3, which brought back the u''
prefix. So it is not the case that core devs pay no attention to Armin
when he engages us on an 'improve 3.x' basis.

--
Terry Jan Reedy

Serhiy Storchaka

unread,
Jan 5, 2014, 5:28:41 PM1/5/14
to pytho...@python.org
05.01.14 15:34, Chris Angelico написав(ла):
> Why is a StopIteration bubbling up? (I don't have Flask, so I can't
> verify this.) Is it as simple as "this should be raising from None",
> or is there something else going on?

Yes, it is. Stdlib json module uses "from None".

Serhiy Storchaka

unread,
Jan 5, 2014, 5:32:47 PM1/5/14
to pytho...@python.org
I wonder why nobody complains about the absent of implicit conversion
between int and str. In PHP you can write 2 + "3" and got 5, but in
Python this is an error. So sad!

Steven D'Aprano

unread,
Jan 5, 2014, 5:56:27 PM1/5/14
to
Chris Angelico wrote:

> But this bit looks odd:
>
> """
> For instance passing a urllib request object to Flask's JSON parse
> function breaks on Python 3 but works on Python 2 as a result of this:
>
>>>> from urllib.request import urlopen
>>>> r = urlopen('https://pypi.python.org/pypi/Flask/json')
>>>> from flask import json
>>>> json.load(r)
> Traceback (most recent call last):
> File "decoder.py", line 368, in raw_decode
> StopIteration
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: No JSON object could be decoded
> """

I'm not sure about the "works on Python 2" part. Is Armin just complaining
about the StopIteration being visible in Python 3 but hidden in Python 2? I
don't have Flask installed, and aren't going to install it just for this.


> Why is a StopIteration bubbling up? (I don't have Flask, so I can't
> verify this.) Is it as simple as "this should be raising from None",
> or is there something else going on?

Remember that "raise Spam from None" only works from Python 3.3 onwards.
Personally, I think that releasing nested tracebacks before having a way to
suppress the display was a strategic blunder, but it's fixed now, at least
for those who can jump straight to 3.3 and not bother supporting 3.1 and
3.2.


--
Steven

Ben Finney

unread,
Jan 5, 2014, 6:30:16 PM1/5/14
to pytho...@python.org
Chris Angelico <ros...@gmail.com> writes:

> Maybe it's the better way, but like trying to get people to switch
> from MS Word onto an open system, it's far easier to push for Open
> Office than for LaTeX.

If you're going to be pushing people to a free software system,
OpenOffice is no longer the one to choose; its owners several years ago
shunted it to a dead end where very little active development can
happen, and its development community have moved to more productive
ground.

Rather, the same code base has since 2010 been actively developed as
LibreOffice <URL:http://libreoffice.org/>, and it is now showing far
more improvement and document compatibility as a result.

In short: Everything that was good about OpenOffice is now called
LibreOffice, which had to change its name only because the owners of
that name refused to let it go.

> Getting your head around a whole new way of thinking about your data
> is work, and people want to be lazy. (That's not a bad thing, by the
> way. Laziness means schedules get met.)

Right. I think shifting people to LibreOffice is an excellent and
realistic step toward imcreasing people's software and data freedom.

--
\ “It is far better to grasp the universe as it really is than to |
`\ persist in delusion, however satisfying and reassuring.” —Carl |
_o__) Sagan |
Ben Finney

Emile van Sebille

unread,
Jan 5, 2014, 6:31:28 PM1/5/14
to pytho...@python.org
I'd want my implicit conversion of 2 + '3' to get '23'

That's why it's not there...

Emile

Ethan Furman

unread,
Jan 5, 2014, 6:45:51 PM1/5/14
to pytho...@python.org
On 01/05/2014 03:31 PM, Emile van Sebille wrote:
> On 01/05/2014 02:32 PM, Serhiy Storchaka wrote:
>> I wonder why nobody complains about the absent of implicit conversion
>> between int and str. In PHP you can write 2 + "3" and got 5, but in
>> Python this is an error. So sad!
>
>
> I'd want my implicit conversion of 2 + '3' to get '23'

Huh. And here I thought 'twenty-three' was the right answer! ;)

--
~Ethan~

Chris Angelico

unread,
Jan 5, 2014, 7:04:37 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 9:56 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
>> Why is a StopIteration bubbling up? (I don't have Flask, so I can't
>> verify this.) Is it as simple as "this should be raising from None",
>> or is there something else going on?
>
> Remember that "raise Spam from None" only works from Python 3.3 onwards.
> Personally, I think that releasing nested tracebacks before having a way to
> suppress the display was a strategic blunder, but it's fixed now, at least
> for those who can jump straight to 3.3 and not bother supporting 3.1 and
> 3.2.

Fair enough. If it's a problem, I'm sure Flask could do something like
(untested):

error = False
try:
next(whatever)
except StopIteration:
error = True
if error: raise ValueError("...")

which would work across all. But that's assuming that it really is
just a small matter of traceback ugliness. The post implies that it's
a lot worse than that.

ChrisA

Chris Angelico

unread,
Jan 5, 2014, 7:09:07 PM1/5/14
to pytho...@python.org
I quite like 2+"3" being "23", as it simplifies a lot of string
manipulation. But there's another option: 2+"3456" could be "56". That
one makes even more sense... doesn't it? I mean, C does it so it must
make sense...

ChrisA

Chris Angelico

unread,
Jan 5, 2014, 7:23:02 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 10:30 AM, Ben Finney <ben+p...@benfinney.id.au> wrote:
> Chris Angelico <ros...@gmail.com> writes:
>
>> Maybe it's the better way, but like trying to get people to switch
>> from MS Word onto an open system, it's far easier to push for Open
>> Office than for LaTeX.
>
> If you're going to be pushing people to a free software system,
> OpenOffice is no longer the one to choose; its owners several years ago
> shunted it to a dead end where very little active development can
> happen, and its development community have moved to more productive
> ground.

Handwave, handwave. The FOSS office suite that comes conveniently in
the Debian repositories. It was OO a while ago, it's now LO, but same
difference. If LO ever renames and becomes FreeOffice or ZOffice or
anything else under the sun, it would still be the easier option for
MS Word users to switch to.

(And actually, I haven't been pushing people off MS Word so much as
off DeScribe Word Processor. But since most people here won't have
heard of that, I went for the more accessible analogy.)

>> Getting your head around a whole new way of thinking about your data
>> is work, and people want to be lazy. (That's not a bad thing, by the
>> way. Laziness means schedules get met.)
>
> Right. I think shifting people to LibreOffice is an excellent and
> realistic step toward imcreasing people's software and data freedom.

Yeah. Which is why I do it. But the other night, my mum was trying to
lay out her book in LO, and was having some problems with the system
of having each chapter in a separate file. (Among other things, styles
weren't shared across them all, so a tweak to a style means opening up
every chapter and either doing a parallel edit or figuring out how to
import styles.) So yes, it's a realistic and worthwhile step, but it's
not a magic solution to all problems. She doesn't have time to learn a
whole new system. Maybe - in the long term - LaTeX would actually save
her time, but it's certainly a much harder 'sell' than LO.

ChrisA

Ethan Furman

unread,
Jan 5, 2014, 7:13:32 PM1/5/14
to pytho...@python.org
On 01/05/2014 02:56 PM, Steven D'Aprano wrote:
>
> Remember that "raise Spam from None" only works from Python 3.3 onwards.
> Personally, I think that releasing nested tracebacks before having a way to
> suppress the display was a strategic blunder [...]

I would just call it really really annoying. ;) On the upside adding 'from None' was a small enough project that I was
able to get it going, and that was the bridge to eventually becoming a dev. :D

--
~Ethan~

Dan Stromberg

unread,
Jan 5, 2014, 7:39:55 PM1/5/14
to Serhiy Storchaka, Python List
On Sun, Jan 5, 2014 at 2:32 PM, Serhiy Storchaka <stor...@gmail.com> wrote:
> I wonder why nobody complains about the absent of implicit conversion
> between int and str. In PHP you can write 2 + "3" and got 5, but in Python
> this is an error. So sad!

I like Python strongly typed, thank you very much. Please don't break it.

Not raising an exception when implicitly converting types tends to
lead to hard-to-track-down bugs.

Ned Batchelder

unread,
Jan 5, 2014, 8:16:47 PM1/5/14
to pytho...@python.org
On 1/5/14 8:22 AM, Ned Batchelder wrote:
> On 1/5/14 8:14 AM, Mark Lawrence wrote:
>> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/
>>
>> Please don't shoot the messenger :)
>>
>
> With all of the talk about py 2 vs. py3 these days, this is the blog
> post that I think deserves the most real attention. I haven't had to do
> the kind of coding that Armin is talking about, but I've heard more than
> one person talk about the difficulty of it in Python 3.
>
> If anyone wants Python 3 uptake improved, the best thing would be to
> either explain to Armin how he missed the easy way to do what he wants
> (seems unlikely), or advocate to the core devs why they should change
> things to improve this situation.
>

OK, let's see what we got from three core developers on this list:

- Antoine dismissed the post as "a rant".

- Terry took issue with three claims made, and ended with, "I suspect
there are other basic errors, but I mostly quit reading at this point."

- Serhiy made a sarcastic comment comparing Python 3's bytes/unicode
handling with Python 2's int/str handling, implying that since int/str
wasn't a problem, then bytes/unicode isn't either.

This is discouraging. Armin is a prolific and well-known contributor to
a number of very popular packages. He's devoted a great deal of time to
the Python ecosystem, including writing the PEP that got u"" literals
back in Python 3.3. If he's having trouble with Python 3, it's a
serious problem.

You can look through his problems and decide that he's "wrong," or that
he's "ranting," but that doesn't change the fact that Python 3 is
encountering friction. What happens when a significant fraction of your
customers are "wrong"?

Core developers: I thank you for the countless hours you have devoted to
building all of the versions of Python. I'm sure in many ways it's a
thankless task. But you have a problem. What's the point in being
right if you end up with a product that people don't use?

If Armin, with all of his skills and energy, is having problems using
your product, then there's a problem. Compounding that problem is the
attitude that dismisses him as wrong.

Kenneth Reitz's reaction to Armin's blog post was: "A fantastic article
about why Python 2 is a superior programming language for my personal
use case." https://twitter.com/kennethreitz/status/419889312935993344

So now we have two revered developers vocally having trouble with Python
3. You can dismiss their concerns as niche because it's only network
programming, but that would be a mistake. Given the centrality of
network programming in today's world, and the dominance these two
developers have in building libraries to solve networking problems, I
think someone should take their concerns seriously.

Maybe there are core developers who are trying hard to solve the
problems Kenneth and Armin are facing. It would be great if that work
was more visible. I don't see it, and apparently Armin doesn't either.

Ethan Furman

unread,
Jan 5, 2014, 7:46:58 PM1/5/14
to pytho...@python.org
On 01/05/2014 05:14 AM, Mark Lawrence wrote:
>
> Please don't shoot the messenger :)

While I don't agree with his assessment of Python 3 in total, I definitely feel his pain with regards to bytestrings in
Py3 -- because they don't exist. 'bytes' /looks/ like a bytestring, but really it's just a bunch of integers:

--> b'abc
'b'abc'
--> b'abc'[1]
98

Maybe for 3.5 somebody *cough* will make a bytestring type for those of us who have to support the lower-level protocols...

--
~Ethan~

*Cast your vote over on Python Ideas!

Chris Angelico

unread,
Jan 5, 2014, 8:48:03 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> So now we have two revered developers vocally having trouble with Python 3.
> You can dismiss their concerns as niche because it's only network
> programming, but that would be a mistake.

IMO, network programming (at least on the internet) is even more Py3's
domain (pun not intended).

1) The internet is global. You WILL come across other languages, other
scripts, everything.

2) In most cases, everything is clearly either text or binary, and
usually text has an associated (and very clear) encoding (eg HTTP
headers). If it's not explicitly given, the RFCs will often stipulate
what the encoding should be. It's pretty easy, you don't have to go
"Is this Latin-1? Maybe CP-1252? Could it be something else?".

3) The likelihood is high that you'll be working with someone else's
code at the other end. Ties in with #2 - this is why the specs are so
carefully written. Getting these things right is incredibly important.

If I'm writing something that might have to work with anything from
anywhere, I want a system that catches potential errors earlier rather
than later. I don't want to write interpolated SQL that works
perfectly until Mr O'Hara tries to sign up (or, worse, young Robert
whose sister is named "Help I'm trapped in a driver's license
factory"); I want to get it right from the start. Yes, that means more
work to get "Hello, World" going. Yes, it means that I need to get my
head around stuff that I didn't think I'd have to. (One time I
implemented Oauth manually rather than using a library - the immediate
reason was some kind of issue with the library, but I was glad I did,
because it meant I actually understood what was going on; came in
handy about two weeks later when the far end had a protocol problem.)

Most of the complaints about Py3 are "it's harder to get something
started (or port from Py2)". My answer is that it's easier to get
something finished.

ChrisA

Roy Smith

unread,
Jan 5, 2014, 8:56:51 PM1/5/14
to
In article <mailman.4995.1388972...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:

> One time I implemented Oauth manually rather than using a library

Me too. You have my sympathy. What a mess.

Ned Batchelder

unread,
Jan 5, 2014, 9:17:05 PM1/5/14
to pytho...@python.org
I like all of this logic, it makes sense to me. But Armin and Kenneth
have more experience than I do actually writing networking software.
They are both very smart and very willing to do a ton of work. And both
are unhappy. I don't know how to square that with the logic that makes
sense to me.

And no amount of logic about why Python 3 is better is going to solve
the problem of the two of them being unhappy. They are speaking from
experience working with the actual product.

I'm not trying to convince anyone that Python 3 is good or bad. I'm
talking about our approach to unhappy and influential customers.

Mark Janssen

unread,
Jan 5, 2014, 9:25:37 PM1/5/14
to Ned Batchelder, Python List
>> Most of the complaints about Py3 are "it's harder to get something
>> started (or port from Py2)". My answer is that it's easier to get
>> something finished.
>
> I like all of this logic, it makes sense to me. But Armin and Kenneth have
> more experience than I do actually writing networking software. They are
> both very smart and very willing to do a ton of work. And both are unhappy.
> I don't know how to square that with the logic that makes sense to me.
>
> And no amount of logic about why Python 3 is better is going to solve the
> problem of the two of them being unhappy. They are speaking from experience
> working with the actual product.

+1, well-said.

I hope you'll see my comments on the thread on the "bytestring type".
This issue also goes back to the schism in 2004 from the VPython folks
over floating point. Again the ***whole*** issue is ignoring the
relationship between your abstractions and your concrete architectural
implementations. I honestly think Python3 will have to be regressed
despite all the circle jerking about how "everyone's moving to Python
3 now". I see how I was inadequately explaining the whole issue by
using high-level concepts like "models of computation", but the
comments on the aforementioned thread go right down to the heart of
the issue.

markj

Dan Stromberg

unread,
Jan 5, 2014, 9:37:40 PM1/5/14
to Ethan Furman, Python List
On Sun, Jan 5, 2014 at 4:46 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> While I don't agree with his assessment of Python 3 in total, I definitely
> feel his pain with regards to bytestrings in Py3 -- because they don't
> exist. 'bytes' /looks/ like a bytestring, but really it's just a bunch of
> integers:
>
> --> b'abc
> 'b'abc'
> --> b'abc'[1]
> 98
>
> Maybe for 3.5 somebody *cough* will make a bytestring type for those of us
> who have to support the lower-level protocols...

I don't see anything wrong with the new bytes type, including the
example above. I wrote a backup program that used bytes or str's (3.x
or 2.x respectively), and they both worked fine for that. I had to
code around some limited number of surprises, but they weren't
substantive problems, they were just differences.

The argument seems to be "3.x doesn't work the way I'm accustomed to,
so I'm not going to use it, and I'm going to shout about it until
others agree with me." And yes, I read Armin's article - it was
pretty long....

Also, I never once wrote a program to use 2.x's unicode type. I
always used str. It was important to make str handle unicode, to get
people (like me!) to actually use unicode.

Two modules helped me quite a bit with backshift, the backup program I
mentioned:
http://stromberg.dnsalias.org/~dstromberg/backshift/documentation/html/python2x3-module.html
http://stromberg.dnsalias.org/~dstromberg/backshift/documentation/html/bufsock-module.html

python2x3 is tiny, and similar in spirit to the popular six module.

bufsock is something I wrote years ago that enables consistent I/O on
sockets, files or file descriptors; 2.x or 3.x.

HTH

Ethan Furman

unread,
Jan 5, 2014, 9:23:57 PM1/5/14
to pytho...@python.org
On 01/05/2014 05:48 PM, Chris Angelico wrote:
> On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
>> So now we have two revered developers vocally having trouble with Python 3.
>> You can dismiss their concerns as niche because it's only network
>> programming, but that would be a mistake.
>
> IMO, network programming (at least on the internet) is even more Py3's
> domain (pun not intended).

The issue is not how to handle text, the issue is how to handle ascii when it's in a bytes object.

Using my own project [1] as a reference: good ol' dbf files -- character fields, numeric fields, logic fields, time
fields, and of course the metadata that describes these fields and the dbf as a whole. The character fields I turn into
unicode, no sweat. The metadata fields are simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did
the job just fine. In Py3 that compares an int (67) to the unicode letter 'C' and returns False. For me this is simply
a major annoyance, but I only have a handful of places where I have to deal with this. Dealing with protocols where
bytes is the norm and embedded ascii is prevalent -- well, I can easily imagine the nightmare.

The most unfortunate aspect is that even if we did "fix" it in 3.5, it wouldn't help any body who has to support
multiple versions... unless, of course, a backport could also be made.

--
~Ethan~

Chris Angelico

unread,
Jan 5, 2014, 9:55:34 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 1:23 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> The metadata fields are simple ascii, and in Py2 something like `if
> header[FIELD_TYPE] == 'C'` did the job just fine. In Py3 that compares an
> int (67) to the unicode letter 'C' and returns False. For me this is simply
> a major annoyance, but I only have a handful of places where I have to deal
> with this. Dealing with protocols where bytes is the norm and embedded
> ascii is prevalent -- well, I can easily imagine the nightmare.

It can't be both things. It's either bytes or it's text. If it's text,
then decoding it as ascii will give you a Unicode string; if it's
small unsigned integers that just happen to correspond to ASCII
values, then I would say the right thing to do is integer constants -
or, in Python 3.4, an integer enumeration:

>>> socket.AF_INET
<AddressFamily.AF_INET: 2>
>>> socket.AF_INET == 2
True

I'm not sure what FIELD_TYPE of 'C' means, but my guess is that it's a
CHAR field. I'd just have that as the name, something like:

CHAR = b'C'[0]

if header[FIELD_TYPE] == CHAR:
# handle char field

If nothing else, this would reduce the number of places where you
actually have to handle this. Plus, the code above will work on many
versions of Python (I'm not sure how far back the b'' prefix is
allowed - probably 2.6).

ChrisA

Roy Smith

unread,
Jan 5, 2014, 11:05:01 PM1/5/14
to
In article <mailman.4998.1388975...@python.org>,
Mark Janssen <dreamin...@gmail.com> wrote:

> I honestly think Python3 will have to be regressed despite all the
> [obscenity elided] about how "everyone's moving to Python 3 now".

This forum has seen a lot honest disagreement about issues, sometimes
hotly debated. That's OK. Sometimes the discussion has not been
completely professional, which is less than wonderful, but everything
can't always be wonderful.

There is absolutely no reason, however, to resort to profanity. That's
just unacceptable.

Roy Smith

unread,
Jan 5, 2014, 11:24:11 PM1/5/14
to
In article <mailman.5001.1388976...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:

> It can't be both things. It's either bytes or it's text.

I've never used Python 3, so forgive me if these are naive questions.
Let's say you had an input stream which contained the following hex
values:

$ hexdump data
0000000 d7 a8 a3 88 96 95

That's EBCDIC for "Python". What would I write in Python 3 to read that
file and print it back out as utf-8 encoded Unicode?

Or, how about a slightly different example:

$ hexdump data
0000000 43 6c 67 75 62 61

That's "Python" in rot-13 encoded ascii. How would I turn that into
cleartext Unicode in Python 3?

Terry Reedy

unread,
Jan 5, 2014, 11:26:14 PM1/5/14
to pytho...@python.org
On 1/5/2014 8:16 PM, Ned Batchelder wrote:

> OK, let's see what we got from three core developers on this list:

To me, the following is a partly unfair summary.

> - Antoine dismissed the post as "a rant".

He called it a rant while acknowledging that there is a unsolved issue
with transforms. Whether he was 'dismissing' it or not, I do not know.
Antoine also noted that there does not seem to be anything new in this
post that Armin has not said before. Without reading in detail, I had
the same impression.

> - Terry took issue with three claims made, and ended with, "I suspect
> there are other basic errors, but I mostly quit reading at this point."

You are discouraged that I quit reading? How much sludge do you expect
me to wade through? If Armin wants my attention (and I do not think he
does), it is *his* responsibility to write in a readable manner.

But I read a bit more and found a 4th claim to 'take issue with' (to be
polite):
"only about 3% of all Python developers using Python 3 properly"
with a link to
http://alexgaynor.net/2014/jan/03/pypi-download-statistics/
The download statistics say nothing about the percent of all Python
developers using Python 3, let alone properly, and Alex Gaynor makes no
such claim as Armin did.

I would not be surprised if a majority of Python users have never
downloaded from pypi. What I do know from reading the catalog-sig (pypi)
list for a couple of years is that there are commercial developers who
use pypi heavily to update 1000s of installations and that they drive
the development of the pypi infrastructure. I strongly suspect that they
strongly skew the download statistics.

Dubious claim 5 is this: "For 97% of us, Python 2 is our beloved world
for years to come". For Armin's narrow circle, that may be true, but I
suspect that more than 3% of Python programmers have never written
Python2 only code.

> - Serhiy made a sarcastic comment comparing Python 3's bytes/unicode
> handling with Python 2's int/str handling, implying that since int/str
> wasn't a problem, then bytes/unicode isn't either.

Serhiy's point was about the expectation of implicit conversion
(int/str) versus (bytes/str) and the complaint about removal of implicit
conversion. I suspect that part of his point is that if we never had
implicit bytes/unicode conversion, it would not be expected.

--
Terry Jan Reedy

Tim Chase

unread,
Jan 5, 2014, 11:41:23 PM1/5/14
to pytho...@python.org
On 2014-01-05 23:24, Roy Smith wrote:
> $ hexdump data
> 0000000 d7 a8 a3 88 96 95
>
> That's EBCDIC for "Python". What would I write in Python 3 to read
> that file and print it back out as utf-8 encoded Unicode?
>
> Or, how about a slightly different example:
>
> $ hexdump data
> 0000000 43 6c 67 75 62 61
>
> That's "Python" in rot-13 encoded ascii. How would I turn that
> into cleartext Unicode in Python 3?


tim@laptop$ python3
Python 3.2.3 (default, Feb 20 2013, 14:44:27)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s1 = b'\xd7\xa8\xa3\x88\x96\x95'
>>> s1.decode('ebcdic-cp-be')
'Python'
>>> s2 = b'\x43\x6c\x67\x75\x62\x61'
>>> from codecs import getencoder
>>> getencoder("rot-13")(s2.decode('utf-8'))[0]
'Python'

-tkc



Roy Smith

unread,
Jan 5, 2014, 11:49:47 PM1/5/14
to
In article <mailman.5004.1388983...@python.org>,
Thanks. But, I see I didn't formulate my problem statement well. I was
(naively) assuming there wouldn't be a built-in codec for rot-13. Let's
assume there isn't; I was trying to find a case where you had to treat
the data as integers in one place and text in another. How would you do
that?

Chris Angelico

unread,
Jan 5, 2014, 11:51:09 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 3:24 PM, Roy Smith <r...@panix.com> wrote:
> I've never used Python 3, so forgive me if these are naive questions.
> Let's say you had an input stream which contained the following hex
> values:
>
> $ hexdump data
> 0000000 d7 a8 a3 88 96 95
>
> That's EBCDIC for "Python". What would I write in Python 3 to read that
> file and print it back out as utf-8 encoded Unicode?

*deletes the two paragraphs that used to be here* Turns out Python 3
_does_ have an EBCDIC decoder... but it's not called EBCDIC.

>>> b"\xd7\xa8\xa3\x88\x96\x95".decode("cp500")
'Python'

This sounds like a good one for getting an alias, either "ebcdic" or
"EBCDIC". I didn't know that this was possible till I googled the
problem and saw someone else's solution.

To print that out as UTF-8, just decode and then encode:

>>> b"\xd7\xa8\xa3\x88\x96\x95".decode("cp500").encode("utf-8")
b'Python'

In the specific case of files on the disk, you could open them with
encodings specified, in which case you don't need to worry about the
details.

with open("data",encoding="cp500") as infile:
with open("data_utf8","w",encoding="utf-8") as outfile:
outfile.write(infile.read())

Of course, this is assuming that Unicode has a perfect mapping for
every EBCDIC character. I'm not familiar enough with EBCDIC to be sure
that that's true, but I strongly suspect it is. And if it's not,
you'll get an exception somewhere along the way, so you'll know
something's gone wrong. (In theory, a "transcode" function might be
able to give you a warning before it even sees your data -
transcode("utf-8", "iso-8859-3") could alert you to the possibility
that not everything in the source character set can be encoded. But
that's a pretty esoteric requirement.)

> Or, how about a slightly different example:
>
> $ hexdump data
> 0000000 43 6c 67 75 62 61
>
> That's "Python" in rot-13 encoded ascii. How would I turn that into
> cleartext Unicode in Python 3?

That's one of the points that's under dispute. Is rot13 a
bytes<->bytes encoding, or is it str<->str, or is it bytes<->str? The
issue isn't clear. Personally, I think it makes good sense as a
str<->str translation, which would mean that the process would be
somewhat thus:

>>> rot13={}
>>> for i in range(13):
rot13[65+i]=65+i+13
rot13[65+i+13]=65+i
rot13[97+i]=97+i+13
rot13[97+i+13]=97+i

>>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to turn a hex dump into a bytes literal?
>>> data.decode().translate(rot13)
'Python'

This is treating rot13 as a translation of Unicode codepoints to other
Unicode codepoints, which is different from an encode operation (which
takes abstract Unicode data and produces concrete bytes) or a decode
operation (which does the reverse). But this is definitely a grey
area. It's common for cryptographic algorithms to work with bytes,
meaning that their "decoded" text is still bytes. (Or even less than
bytes. The famous Enigma machines from World War II worked with the 26
letters as their domain and range.) Should the Python codecs module
restrict itself to the job of translating between bytes and str, or is
it a tidy place to put those other translations as well?

ChrisA

Chris Angelico

unread,
Jan 5, 2014, 11:59:34 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 3:49 PM, Roy Smith <r...@panix.com> wrote:
> Thanks. But, I see I didn't formulate my problem statement well. I was
> (naively) assuming there wouldn't be a built-in codec for rot-13. Let's
> assume there isn't; I was trying to find a case where you had to treat
> the data as integers in one place and text in another. How would you do
> that?

I assumed that you would have checked that one, and answered
accordingly :) Though I did dig into the EBCDIC part of the question.

My thinking is that, if you're working with integers, you probably
mean either bytes (so encode it before you do stuff - typical for
crypto) or codepoints / Unicode ordinals (so use ord()/chr()). In
other languages there are ways to treat strings as though they were
arrays of integers (lots of C-derived languages treat 'a' as 97 and
"a"[0] as 97 also; some extend this to the full Unicode range), and
even there, I almost never actually use that identity much. There's
only one case that I can think of where I did a lot of
string<->integer-array transmutation, and that was using a diff
function that expected an integer array - if the transformation to and
from strings hadn't been really easy, that function would probably
have been written to take strings.

The Py2 str.translate() method was a little clunky to use, but
presumably fast to execute - you build up a lookup table and translate
through that. The Py3 equivalent takes a dict mapping the from and to
values. Pretty easy to use. And it lets you work with codepoints or
strings, as you please.

ChrisA

Ethan Furman

unread,
Jan 6, 2014, 3:40:28 AM1/6/14
to pytho...@python.org
On 01/05/2014 06:23 PM, Ethan Furman wrote:
>
> Using my own project [1] as a reference

[1] https://pypi.python.org/pypi/dbf

Tim Chase

unread,
Jan 6, 2014, 6:49:28 AM1/6/14
to pytho...@python.org
On 2014-01-06 15:51, Chris Angelico wrote:
> >>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to
> >>> turn a hex dump into a bytes literal?

Depends on how you source them:


# space separated:
>>> s1 = "43 6c 67 75 62 61"
>>> ''.join(chr(int(pair, 16)) for pair in s1.split())
'Clguba'

# all smooshed together:
>>> s2 = s1.replace(' ','')
>>> s2
'436c67756261'
>>> ''.join(chr(int(s2[i*2:(i+1)*2], 16)) for i in range(len(s2)/2))
'Clguba'

# as \xHH escaped:
>>> s3 = ''.join('\\x'+s2[i*2:(i+1)*2] for i in range(len(s2)/2))
>>> print(s3)
\x43\x6c\x67\x75\x62\x61
>>> print(b3)
b'\\x43\\x6c\\x67\\x75\\x62\\x61'
>>> b3.decode('unicode_escape')
'Clguba'

It might get more complex if you're not just dealing with bytes, or
if you have some other encoding scheme, but "s1" (space-separated, or
some other delimiter such as colon-separated that can be passed
to the .split() call) and "s2" (all smooshed together) are the two I
encounter most frequently.

-tkc





Ned Batchelder

unread,
Jan 6, 2014, 7:39:27 AM1/6/14
to pytho...@python.org
On 1/5/14 11:26 PM, Terry Reedy wrote:
> On 1/5/2014 8:16 PM, Ned Batchelder wrote:
>
>> OK, let's see what we got from three core developers on this list:
>
> To me, the following is a partly unfair summary.

I apologize, I'm sure there were details I skipped in my short summary.
You are still talking about whether Armin is right, and whether he
writes well, about flaws in his statistics, etc. I'm talking about the
fact that an organization (Python core development) has a product
(Python 3) that is getting bad press. Popular and vocal customers
(Armin, Kenneth, and others) are unhappy. What is being done to make
them happy? Who is working with them? They are not unique, and their
viewpoints are not outliers.

I'm not talking about the technical details of bytes and Unicode. I'm
talking about making customers happy.

Mark Lawrence

unread,
Jan 6, 2014, 8:44:41 AM1/6/14
to pytho...@python.org
On 06/01/2014 12:39, Ned Batchelder wrote:
>
> I'm not talking about the technical details of bytes and Unicode. I'm
> talking about making customers happy.
>

Simply scrap PEP 404 and the currently unhappy customers will be happy
as they'll be free to do all the work they want on Python 2.8, as my
understanding is that the vast majority of the Python core developers
won't do it for them.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Gene Heskett

unread,
Jan 6, 2014, 9:32:35 AM1/6/14
to pytho...@python.org
On Monday 06 January 2014 08:52:42 Ned Batchelder did opine:
[...]
> You are still talking about whether Armin is right, and whether he
> writes well, about flaws in his statistics, etc. I'm talking about the
> fact that an organization (Python core development) has a product
> (Python 3) that is getting bad press. Popular and vocal customers
> (Armin, Kenneth, and others) are unhappy. What is being done to make
> them happy? Who is working with them? They are not unique, and their
> viewpoints are not outliers.
>
> I'm not talking about the technical details of bytes and Unicode. I'm
> talking about making customers happy.

+1 Ned. Quite well said.

And from my lurking here, its quite plain to me that 3.x python has a
problem with everyday dealing with strings. If it is not solved relatively
quickly, then I expect there will be a fork, a 2.8 by those most heavily
invested. Or an exodus to the next "cool" language.

No language will remain "cool" for long if it cannot simply and dependably
solve the everyday problem of printing the monthly water bill. If it can
be done in assembly, C or even bash, then it should be doable in python
even simpler.

Its nice to be able abstract the functions so they become one word macro's
that wind up using 2 megs of program memory and 200k of stack to print
Hello World, but I can do that with 3 or 4 lines of assembly on a coco3
running nitros9. Or 3 lines of C. The assembly will use perhaps 20 bytes
of stack, the C version maybe 30. And the assembly will be lightening fast
on a cpu with a less than 2 megahertz clock.

Given that the problem IS understood, a language that can simplify solving
a problem is nice, and will be used. But if the problem is not well
understood, then you can write gigo crap in your choice of languages.

Python is supposed to be a problem solver, not a problem creator.

I'll get me coat. :)

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

We are using Linux daily to UP our productivity - so UP yours!
-- Adapted from Pat Paulsen by Joe Sloan
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.

Mark Lawrence

unread,
Jan 6, 2014, 9:55:57 AM1/6/14
to pytho...@python.org
On 06/01/2014 14:32, Gene Heskett wrote:
> On Monday 06 January 2014 08:52:42 Ned Batchelder did opine:
> [...]
>> You are still talking about whether Armin is right, and whether he
>> writes well, about flaws in his statistics, etc. I'm talking about the
>> fact that an organization (Python core development) has a product
>> (Python 3) that is getting bad press. Popular and vocal customers
>> (Armin, Kenneth, and others) are unhappy. What is being done to make
>> them happy? Who is working with them? They are not unique, and their
>> viewpoints are not outliers.
>>
>> I'm not talking about the technical details of bytes and Unicode. I'm
>> talking about making customers happy.
>
> +1 Ned. Quite well said.
>
> And from my lurking here, its quite plain to me that 3.x python has a
> problem with everyday dealing with strings. If it is not solved relatively
> quickly, then I expect there will be a fork, a 2.8 by those most heavily
> invested. Or an exodus to the next "cool" language.
>

It's not at all plain to me, in fact quite the opposite. Please expand
on these problems for mere mortals such as myself.

Ethan Furman

unread,
Jan 6, 2014, 10:10:56 AM1/6/14
to Python
On 01/05/2014 06:37 PM, Dan Stromberg wrote:
>
> The argument seems to be "3.x doesn't work the way I'm accustomed to,
> so I'm not going to use it, and I'm going to shout about it until
> others agree with me."

The argument is that a very important, if small, subset a data manipulation become very painful in Py3. Not impossible,
and not difficult, but painful because the mental model and the contortions needed to get things to work don't sync up
anymore. Painful because Python is, at heart, a simple and elegant language, but with the use-case of embedded ascii in
binary data that elegance went right out the window.

On 01/05/2014 06:55 PM, Chris Angelico wrote:
>
> It can't be both things. It's either bytes or it's text.

Of course it can be:

0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a.......
0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C....
0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N....
0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 0d1a 0a ...

And there we are, mixed bytes and ascii data. As I said earlier, my example is minimal, but still very frustrating in
that normal operations no longer work. Incidentally, if you were thinking that NAME and AGE were part of the ascii
text, you'd be wrong -- the field names are also encoded, as are the Character and Memo fields.

--
~Ethan~

Chris Angelico

unread,
Jan 6, 2014, 10:46:08 AM1/6/14
to Python
On Tue, Jan 7, 2014 at 2:10 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 01/05/2014 06:55 PM, Chris Angelico wrote:
>>
>>
>> It can't be both things. It's either bytes or it's text.
>
>
> Of course it can be:
>
> 0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a.......
> 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C....
> 0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................
> 0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N....
> 0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................
> 0000060: 0d1a 0a ...
>
> And there we are, mixed bytes and ascii data. As I said earlier, my example
> is minimal, but still very frustrating in that normal operations no longer
> work. Incidentally, if you were thinking that NAME and AGE were part of the
> ascii text, you'd be wrong -- the field names are also encoded, as are the
> Character and Memo fields.

That's alternating between encoded text and non-text bytes. Each
individual piece is either text or non-text, not both. The ideal way
to manipulate it would most likely be a simple decode operation that
turns this into (probably) a dictionary, decoding both the
structure/layout and UTF-8 in a single operation. But a less ideal
(and more convenient) solution might be involving what's currently
under discussion elsewhere: a (possibly partial) percent-formatting or
.format() method for bytes.

None of this changes the fact that there are bytes used to
store/transmit stuff, and abstract concepts used to manipulate them.
Just like nobody expects to be able to write a dict to a file without
some form of encoding (pickle, JSON, whatever), you shouldn't expect
to write a character string without first turning it into bytes.

ChrisA

Grant Edwards

unread,
Jan 6, 2014, 10:53:58 AM1/6/14
to
On 2014-01-06, Chris Angelico <ros...@gmail.com> wrote:

>> Right. I think shifting people to LibreOffice is an excellent and
>> realistic step toward imcreasing people's software and data freedom.
>
> Yeah. Which is why I do it. But the other night, my mum was trying to
> lay out her book in LO, and was having some problems with the system
> of having each chapter in a separate file. (Among other things, styles
> weren't shared across them all, so a tweak to a style means opening up
> every chapter and either doing a parallel edit or figuring out how to
> import styles.) So yes, it's a realistic and worthwhile step, but it's
> not a magic solution to all problems. She doesn't have time to learn a
> whole new system. Maybe - in the long term - LaTeX would actually save
> her time, but it's certainly a much harder 'sell' than LO.

Yea, I think laying out a book with something like MS Word or
LibreOffice is nuts. Depending on her formatting needs, a
lighter-weight mark-up language (something like asciidoc) might suite:

http://asciidoc.org/
http://en.wikipedia.org/wiki/AsciiDoc

I've used it to write a 150 page manual, and was quite happy with the
results. It produces DocBook XML, PDF, HTML and a few other output
formats (Including, I think, LibreOffice/OpenOffice). It's _much_
easier to get started with than LaTeX. For printing purposes the
quality of the output is no match for TeX -- but it's better than a
"word processor", and it does a very nice job with HTML output.

--
Grant Edwards grant.b.edwards Yow! It's a hole all the
at way to downtown Burbank!
gmail.com

Chris Angelico

unread,
Jan 6, 2014, 11:01:16 AM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 2:53 AM, Grant Edwards <inv...@invalid.invalid> wrote:
> Yea, I think laying out a book with something like MS Word or
> LibreOffice is nuts. Depending on her formatting needs, a
> lighter-weight mark-up language (something like asciidoc) might suite:
>
> http://asciidoc.org/
> http://en.wikipedia.org/wiki/AsciiDoc
>
> I've used it to write a 150 page manual, and was quite happy with the
> results. It produces DocBook XML, PDF, HTML and a few other output
> formats (Including, I think, LibreOffice/OpenOffice). It's _much_
> easier to get started with than LaTeX. For printing purposes the
> quality of the output is no match for TeX -- but it's better than a
> "word processor", and it does a very nice job with HTML output.

Hmm. Might be useful in some other places. I'm currently trying to
push for a web site design that involves docutils/reStructuredText,
but am flexible on the exact markup system used. My main goal, though,
is to separate content from structure and style - and my secondary
goal is to have everything done as plain text files (apart from actual
images), so the source control diffs are useful :)

ChrisA

Steven D'Aprano

unread,
Jan 6, 2014, 11:24:19 AM1/6/14
to
Roy Smith wrote:

> In article <mailman.5001.1388976...@python.org>,
> Chris Angelico <ros...@gmail.com> wrote:
>
>> It can't be both things. It's either bytes or it's text.
>
> I've never used Python 3, so forgive me if these are naive questions.
> Let's say you had an input stream which contained the following hex
> values:
>
> $ hexdump data
> 0000000 d7 a8 a3 88 96 95
>
> That's EBCDIC for "Python". What would I write in Python 3 to read that
> file and print it back out as utf-8 encoded Unicode?

There's no one EBCDIC encoding. Like the so-called "extended ASCII"
or "ANSI" encodings that followed, IBM had many different versions of
EBCDIC customised for different machines and markets -- only even more
poorly documented. But since the characters in that are all US English
letters, any EBCDIC dialect ought to do it:

py> b = b'\xd7\xa8\xa3\x88\x96\x95'
py> b.decode('CP500')
'Python'


To read it from a file:

text = open("somefile", encoding='CP500').read()

And to print out the UTF-8 encoded bytes:

print(text.encode('utf-8'))



> Or, how about a slightly different example:
>
> $ hexdump data
> 0000000 43 6c 67 75 62 61
>
> That's "Python" in rot-13 encoded ascii. How would I turn that into
> cleartext Unicode in Python 3?


In Python 3.3, you can do this:

py> b = b'\x43\x6c\x67\x75\x62\x61'
py> s = b.decode('ascii')
py> print(s)
Clguba
py> import codecs
py> codecs.decode(s, 'rot-13')
'Python'

(This may not work in Python 3.1 or 3.2, since rot13 and assorted other
string-to-string and byte-to-byte codecs were mistakenly removed. I say
mistakenly, not in the sense of "by accident", but in the sense of "it was
an error of judgement". Somebody was under the misapprehension that the
codec machinery could only work on Unicode <-> bytes.)

If you don't want to use the codec, you can do it by hand:

def rot13(astring):
result = []
for c in astring:
i = ord(c)
if ord('a') <= i <= ord('m') or ord('A') <= i <= ord('M'):
i += 13
elif ord('n') <= i <= ord('z') or ord('N') <= i <= ord('Z'):
i -= 13
result.append(chr(i))
return ''.join(result)

But why would you want to do it the slow way?



--
Steven

Antoine Pitrou

unread,
Jan 6, 2014, 11:29:01 AM1/6/14
to pytho...@python.org
Ned Batchelder <ned <at> nedbatchelder.com> writes:
>
> You can look through his problems and decide that he's "wrong," or that
> he's "ranting," but that doesn't change the fact that Python 3 is
> encountering friction. What happens when a significant fraction of your
> customers are "wrong"?

Well, yes, there is some friction and this is quite expectable, when
shipping incompatible changes. Other pieces of software have undergone a
similar process (e.g. Apache 1.x -> Apache 2.x).

(the alternative is to maintain a piece of software that sticks with obsolete
conventions, e.g. emacs)

> Core developers: I thank you for the countless hours you have devoted to
> building all of the versions of Python. I'm sure in many ways it's a
> thankless task. But you have a problem. What's the point in being
> right if you end up with a product that people don't use?

People don't use? According to available figures, there are more downloads of
Python 3 than downloads of Python 2 (Windows installers, mostly):
http://www.python.org/webstats/

The number of Python 3-compatible packages has been showing a constant and
healthy increase for years:
http://dev.pocoo.org/~gbrandl/py3.html

And Dan's survey shows 77% of respondents think Python 3 wasn't a mistake:
https://wiki.python.org/moin/2.x-vs-3.x-survey

> Maybe there are core developers who are trying hard to solve the
> problems Kenneth and Armin are facing. It would be great if that work
> was more visible. I don't see it, and apparently Armin doesn't either.

While this is being discussed:
https://mail.python.org/pipermail/python-dev/2014-January/130923.html

I would still point out that "Kenneth and Armin" are not the whole Python
community. Your whole argument seems to be that a couple "revered" (!!)
individuals should see their complaints taken for granted. I am opposed to
rockstarizing the community.

Their contribution is always welcome, of course.

(as for network programming, the people working on and with asyncio don't
seem to find Python 3 terrible)

Regards

Antoine.


Chris Angelico

unread,
Jan 6, 2014, 11:30:17 AM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 3:24 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> If you don't want to use the codec, you can do it by hand:
>
> def rot13(astring):
> result = []
> for c in astring:
> i = ord(c)
> if ord('a') <= i <= ord('m') or ord('A') <= i <= ord('M'):
> i += 13
> elif ord('n') <= i <= ord('z') or ord('N') <= i <= ord('Z'):
> i -= 13
> result.append(chr(i))
> return ''.join(result)
>
> But why would you want to do it the slow way?

Eww. I'd much rather use .translate() than that :)

ChrisA

Chris Angelico

unread,
Jan 6, 2014, 11:36:51 AM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 3:29 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> People don't use? According to available figures, there are more downloads of
> Python 3 than downloads of Python 2 (Windows installers, mostly):
> http://www.python.org/webstats/
>

Unfortunately, that has a massive inherent bias, because there are
Python builds available in most Linux distributions - and stats from
those (like Debian's popcon) will be nearly as useless, because a lot
of them will install one or the other (probably 2.x) without waiting
for the user (so either they'll skew in favour of the one installed,
or in favour of the one NOT installed, because that's the only one
that'll be explicitly requested). It's probably fairly accurate for
Windows stats, though, since most people who want Python on Windows
are going to come to python.org for an installer.

ChrisA

Steven D'Aprano

unread,
Jan 6, 2014, 11:43:52 AM1/6/14
to
Ethan Furman wrote:

> Using my own project [1] as a reference:  good ol' dbf files -- character
> fields, numeric fields, logic fields, time fields, and of course the
> metadata that describes these fields and the dbf as a whole.  The
> character fields I turn into unicode, no sweat.  The metadata fields are
> simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did
> the job just fine.  In Py3 that compares an int (67) to the unicode letter
> 'C' and returns False.  

Why haven't you converted the headers to text too? You're using them as if
they were text. They might happen to merely contain the small subset of
Unicode which matches the ASCII encoding, but that in itself is no good
reason to keep it as bytes. If you want to work with stuff as if it were
text, convert it to text.

If you do have a good reason for keeping them as bytes, say because you need
to do a bunch of bitwise operations on it, it's not that hard to do the job
correctly: instead of defining FIELD_TYPE as 3 (for example), define it as
slice(3,4). Then:

if header[FIELD_TYPE] == b'C':

will work. For sure, this is a bit of a nuisance, and slightly error-prone,
since Python won't complain if you forget the b prefix, it will silently
return False. Which is the right thing to do, inconvenient though it may be
in this case. But it is workable, with a bit of discipline.

Or define a helper, and use that:

def eq(byte, char):
return byte == ord(char)


if eq(header[FIELD_TYPE], 'C'):


Worried about the cost of all those function calls, all those ord()'s? I'll
give you the benefit of the doubt and assume that this is not premature
optimisation. So do it yourself:

C = ord('C') # Convert it once.
if header[FIELD_TYPE] == C: # And use it many times.


[Note to self: when I'm BDFL, encourage much more compile-time
optimisations.]


> For me this is simply a major annoyance, but I
> only have a handful of places where I have to deal with this.  Dealing
> with protocols where bytes is the norm and embedded ascii is prevalent --
> well, I can easily imagine the nightmare.

Is it one of those nightmares where you're being chased down an endless long
corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
terror...


--
Steven

Gene Heskett

unread,
Jan 6, 2014, 11:46:18 AM1/6/14
to pytho...@python.org
On Monday 06 January 2014 11:42:55 Mark Lawrence did opine:
Mortals? Likely nobody here is more acutely aware of his mortality Mark.

But what is the most common post here asking for help? Tossup as to
whether its database related, or strings. Most everything else seems to be
a pretty distant 3rd.

Cheers, Gene

Ethan Furman

unread,
Jan 6, 2014, 11:28:01 AM1/6/14
to pytho...@python.org
On 01/06/2014 07:53 AM, Grant Edwards wrote:
>
> Yea, I think laying out a book with something like MS Word or
> LibreOffice is nuts. Depending on her formatting needs, a
> lighter-weight mark-up language (something like asciidoc) might suite:
>
> http://asciidoc.org/
> http://en.wikipedia.org/wiki/AsciiDoc

Thanks for that!

--
~Ethan~

Ethan Furman

unread,
Jan 6, 2014, 11:23:15 AM1/6/14
to pytho...@python.org
On 01/06/2014 07:46 AM, Chris Angelico wrote:
>
> None of this changes the fact that there are bytes used to
> store/transmit stuff, and abstract concepts used to manipulate them.
> Just like nobody expects to be able to write a dict to a file without
> some form of encoding (pickle, JSON, whatever), you shouldn't expect
> to write a character string without first turning it into bytes.

Writing is only half the battle, and not, as it happens, where I experience the pain. This data must also be /read/.
It has been stated many times that the Py2 str became the Py3 bytes, and yet never in Py2 did 'abc'[1] return 98.

--
~Ethan~

Chris Angelico

unread,
Jan 6, 2014, 11:54:53 AM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 3:43 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
>> For me this is simply a major annoyance, but I
>> only have a handful of places where I have to deal with this. Dealing
>> with protocols where bytes is the norm and embedded ascii is prevalent --
>> well, I can easily imagine the nightmare.
>
> Is it one of those nightmares where you're being chased down an endless long
> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
> terror...

Uhh, I think you're the only one here who has that nightmare, like
Chris Knight with his sun-god robes and naked women throwing pickles
at him.

ChrisA

Mark Lawrence

unread,
Jan 6, 2014, 12:07:00 PM1/6/14
to pytho...@python.org
On 06/01/2014 16:43, Steven D'Aprano wrote:
> Ethan Furman wrote:
>
>> For me this is simply a major annoyance, but I
>> only have a handful of places where I have to deal with this. Dealing
>> with protocols where bytes is the norm and embedded ascii is prevalent --
>> well, I can easily imagine the nightmare.
>
> Is it one of those nightmares where you're being chased down an endless long
> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
> terror...
>

Great minds think alike? :)

Mark Lawrence

unread,
Jan 6, 2014, 12:13:34 PM1/6/14
to pytho...@python.org
As the take of Python 3 is so poor then that must mean all the problems
being reported are still with Python 2. The solution is to upgrade to
Python 3.3+ and the superb PEP 393 FSR which is faster and uses less
memory. Or is it simply that people are so used to doing things
sloppily with Python 2 that they don't like being forced into doing
things correctly with Python 3?

Steven D'Aprano

unread,
Jan 6, 2014, 12:27:40 PM1/6/14
to
Ethan Furman wrote:

> On 01/05/2014 06:37 PM, Dan Stromberg wrote:
>>
>> The argument seems to be "3.x doesn't work the way I'm accustomed to,
>> so I'm not going to use it, and I'm going to shout about it until
>> others agree with me."
>
> The argument is that a very important, if small, subset a data
> manipulation become very painful in Py3. Not impossible, and not
> difficult, but painful because the mental model and the contortions needed
> to get things to work don't sync up
> anymore. Painful because Python is, at heart, a simple and elegant
> language, but with the use-case of embedded ascii in binary data that
> elegance went right out the window.
>
> On 01/05/2014 06:55 PM, Chris Angelico wrote:
>>
>> It can't be both things. It's either bytes or it's text.
>
> Of course it can be:
>
> 0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a.......
> 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C....
> 0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................
> 0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N....
> 0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................
> 0000060: 0d1a 0a ...
>
> And there we are, mixed bytes and ascii data.

Chris didn't say "bytes and ascii data", he said "bytes and TEXT".
Text != "ascii data", and the fact that some people apparently think it
does is pretty much the heart of the problem.

I see no mixed bytes and text. I see bytes. Since the above comes from a
file, it cannot be anything else but bytes. Do you think that a file that
happens to be a JPEG contains pixels? No. It contains bytes which, after
decoding, represents pixels. Same with text, ascii or otherwise.

Now, it is true that some of those bytes happen to fall into the same range
of values as ASCII-encoded text. They may even represent text after
decoding, but since we don't know what the file contents mean, we can't
know that. It might be a mere coincidence that the four bytes starting at
hex offset 40 is the C long 1095189760 which happens to look like "AGE"
with a null at the end. For historical reasons, your hexdump utility
performs that decoding step for you, which is why you can see "NAME"
and "AGE" in the right-hand block, but that doesn't mean the file contains
text. It contains bytes, some of which represents text after decoding.

If you (generic you) don't get that, you'll have a bad time. I mean *really*
get it, deep down in the bone. The long, bad habit of thinking as
ASCII-encoded bytes as text is the problem here. The average programmer has
years and years of experience thinking about decoding bytes to numbers and
back (just not by that name), so it doesn't lead to any cognitive
dissonance to think of hex 4147 4500 as either four bytes, two double-byte
ints, or a single four-byte int. But as soon as "text" comes into the
picture, the average programmer has equally many years of thinking that the
byte 41 "just is" the letter "A", and that's simply *wrong*.


> As I said earlier, my
> example is minimal, but still very frustrating in
> that normal operations no longer work. Incidentally, if you were thinking
> that NAME and AGE were part of the ascii text, you'd be wrong -- the field
> names are also encoded, as are the Character and Memo fields.

What Character and Memo fields? Are you trying to say that the NAME and AGE
are *not* actually ASCII text, but a mere coincidence, like my example of
1095189760? Or are you referring to the fact that they're actually encoded
as ASCII? If not, I have no idea what you are trying to say.



--
Steven

Steven D'Aprano

unread,
Jan 6, 2014, 12:30:24 PM1/6/14
to
Gene Heskett wrote:

> And from my lurking here, its quite plain to me that 3.x python has a
> problem with everyday dealing with strings.

I've been using Python 3.x since Python 3.1 came out, and I haven't come
across any meaningful problems with the everyday dealing with strings.
Quite the opposite -- I never quite understood the difference between text
strings and byte strings until I started using Python 3.

Perhaps you would care to explain what these everyday problems are that you
have seen?


--
Steven

Steven D'Aprano

unread,
Jan 6, 2014, 12:50:18 PM1/6/14
to
Ned Batchelder wrote:

> You are still talking about whether Armin is right, and whether he
> writes well, about flaws in his statistics, etc.  I'm talking about the
> fact that an organization (Python core development) has a product
> (Python 3) that is getting bad press.  Popular and vocal customers
> (Armin, Kenneth, and others) are unhappy.  What is being done to make
> them happy?  Who is working with them?  They are not unique, and their
> viewpoints are not outliers.
>
> I'm not talking about the technical details of bytes and Unicode.  I'm
> talking about making customers happy.

Oh? How much did Armin pay for his Python support? If he didn't pay, he's
not a customer. He's a user.

When something gets bad press, the normal process is to first determine just
how justified that bad press is. (Unless, of course, you're more interested
in just *covering it up* than fixing the problem.) The best solutions are:

- if the bad press is justified, admit it, and fix the problems;

- if the bad press is not justified, try to educate Armin (and others) so
they stop blaming Python for their own errors; try to counter their bad
press with good press; or ignore it, knowing that the internet is
notoriously fickle and in a week people will be hating on Go, or Ruby
instead.

But I get the idea from your post that you don't want to talk about the
technical details of bytes and Unicode, and by extension, whether Python 3
is better or worse than Python 2. That makes it impossible to determine how
valid the bad press is, which leaves us hamstrung. Our only responses are:

- Patronise him. "Yes yes, you poor little thing, we feel your pain. But
what can we do about it?"

- Abuse him and hope he shuts up.

- Give in to his (and by extension, everyone elses) complaints, whether
justified or not, and make Python worse.

- Counter his bad press with good press, and come across as arrogant idiots
by denying actual real problems (if any).

- Wait for the Internet to move on.



--
Steven

Ethan Furman

unread,
Jan 6, 2014, 1:34:10 PM1/6/14
to pytho...@python.org
On 01/06/2014 09:27 AM, Steven D'Aprano wrote:
> Ethan Furman wrote:
>
> Chris didn't say "bytes and ascii data", he said "bytes and TEXT".
> Text != "ascii data", and the fact that some people apparently think it
> does is pretty much the heart of the problem.

The heart of a different problem, not this one. The problem I refer to is that many binary formats have well-defined
ascii-encoded text tidbits. These tidbits were quite easy to work with in Py2, not difficult but not elegant in Py3,
and even worse if you have to support both 2 and 3.


> Now, it is true that some of those bytes happen to fall into the same range
> of values as ASCII-encoded text. They may even represent text after
> decoding, but since we don't know what the file contents mean, we can't
> know that.

Of course we can -- we're the programmer, after all. This is not a random bunch of bytes but a well defined format for
storing data.

> It might be a mere coincidence that the four bytes starting at
> hex offset 40 is the C long 1095189760 which happens to look like "AGE"
> with a null at the end. For historical reasons, your hexdump utility
> performs that decoding step for you, which is why you can see "NAME"
> and "AGE" in the right-hand block, but that doesn't mean the file contains
> text. It contains bytes, some of which represents text after decoding.

As it happens, 'NAME' and 'AGE' are encoded, and will be decoded. They could just as easily have contained tilde's,
accents, umlauts, and other strange (to me) characters. It's actually the 'C' and the 'N' that bug me (like I said, my
example is minimal, especially compared to a network protocol).

And you're right -- it is easy to say FIELD_TYPE = slice(15,16), and it was also easy to say FIELD_TYPE = 15, but there
is a critical difference -- can you spot it?

..
..
..
In case you didn't: both work in Py2, only the slice version works (correctly) in Py3, but the worst part is why do I
have to use a slice to take a single byte when a simple index should work? Because the bytes type lies. It shows, for
example, b'\r\n\x12\x08N\x00' but when I try to access that N to see if this is a Numeric field I get:

--> b'\r\n\x12\x08N\x00'[4]
78

This is a cognitive dissonance that one does not expect in Python.


> If you (generic you) don't get that, you'll have a bad time. I mean *really*
> get it, deep down in the bone. The long, bad habit of thinking as
> ASCII-encoded bytes as text is the problem here.

Different problem. The problem here is that bytes and byte literals don't compare equal.

> the average programmer has equally many years of thinking that the
> byte 41 "just is" the letter "A", and that's simply *wrong*.

Agreed. But byte 41 != b'A', and that is equally wrong.


>> As I said earlier, my
>> example is minimal, but still very frustrating in
>> that normal operations no longer work. Incidentally, if you were thinking
>> that NAME and AGE were part of the ascii text, you'd be wrong -- the field
>> names are also encoded, as are the Character and Memo fields.
>
> What Character and Memo fields? Are you trying to say that the NAME and AGE
> are *not* actually ASCII text, but a mere coincidence, like my example of
> 1095189760? Or are you referring to the fact that they're actually encoded
> as ASCII? If not, I have no idea what you are trying to say.

Yes, NAME and AGE are *not* ASCII text, but latin-1 encoded. The C and the N are ASCII, meaningful as-is. The actual
data stored in a Character (NAME in this case) or Memo (not shown) field would also be latin-1 encoded. (And before you
ask, the encoding is stored in the file header.)

--
~Ethan~

Mark Janssen

unread,
Jan 6, 2014, 2:21:44 PM1/6/14
to Ethan Furman, Python
> The argument is that a very important, if small, subset a data manipulation
> become very painful in Py3. Not impossible, and not difficult, but painful
> because the mental model and the contortions needed to get things to work
> don't sync up anymore.

You are confused. Please see my reply to you on the bytestring type thread.

> Painful because Python is, at heart, a simple and
> elegant language, but with the use-case of embedded ascii in binary data
> that elegance went right out the window.

It went out the window only because the Object model with the
type/class unification was wrong. It was fine before.

Mark

>> It can't be both things. It's either bytes or it's text.
>
> Of course it can be:
>
> 0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a.......
> 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C....
> 0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................
> 0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N....
> 0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................
> 0000060: 0d1a 0a ...
>
> And there we are, mixed bytes and ascii data.

No, you are printing a debug output which shows both. That's called CHEATING.

Mark

Mark Janssen

unread,
Jan 6, 2014, 2:30:55 PM1/6/14
to Ethan Furman, Python List
>> Chris didn't say "bytes and ascii data", he said "bytes and TEXT".
>> Text != "ascii data", and the fact that some people apparently think it
>> does is pretty much the heart of the problem.
>
> The heart of a different problem, not this one. The problem I refer to is
> that many binary formats have well-defined ascii-encoded text tidbits.

Really? If people are using binary with "well-defined ascii-encoded
tidbits", they're doing something wrong. Perhaps you think escape
characters "\n" are "well defined tidbits", but YOU WOULD BE WRONG.
The purpose of binary is to keep things raw. WTF? You guys are so
strange.

>
>> If you (generic you) don't get that, you'll have a bad time. I mean
>> *really*
>> get it, deep down in the bone. The long, bad habit of thinking as
>> ASCII-encoded bytes as text is the problem here.

I think the whole forking community is confused at because of your own
arrogance. Foo(l)s.

markj

Mark Lawrence

unread,
Jan 6, 2014, 2:36:22 PM1/6/14
to pytho...@python.org
On 06/01/2014 19:30, Mark Janssen wrote:
>>> Chris didn't say "bytes and ascii data", he said "bytes and TEXT".
>>> Text != "ascii data", and the fact that some people apparently think it
>>> does is pretty much the heart of the problem.
>>
>> The heart of a different problem, not this one. The problem I refer to is
>> that many binary formats have well-defined ascii-encoded text tidbits.
>
> Really? If people are using binary with "well-defined ascii-encoded
> tidbits", they're doing something wrong. Perhaps you think escape
> characters "\n" are "well defined tidbits", but YOU WOULD BE WRONG.
> The purpose of binary is to keep things raw. WTF? You guys are so
> strange.
>
>>
>>> If you (generic you) don't get that, you'll have a bad time. I mean
>>> *really*
>>> get it, deep down in the bone. The long, bad habit of thinking as
>>> ASCII-encoded bytes as text is the problem here.
>
> I think the whole forking community is confused at because of your own
> arrogance. Foo(l)s.
>
> markj
>

Looks like another bad batch, time to change your dealer again.

Mark Janssen

unread,
Jan 6, 2014, 2:44:21 PM1/6/14
to Mark Lawrence, Python List
> Looks like another bad batch, time to change your dealer again.

??? Strange, when the debate hits bottom, accusations about doing
drugs come up. This is like the third reference (and I don't even
drink alcohol).

mark

Terry Reedy

unread,
Jan 6, 2014, 3:14:23 PM1/6/14
to pytho...@python.org
On 1/6/2014 9:32 AM, Gene Heskett wrote:

> And from my lurking here, its quite plain to me that 3.x python has a
> problem with everyday dealing with strings.

Strings of what? And what specific 'everyday' problem are you referring to?

--
Terry Jan Reedy

Serhiy Storchaka

unread,
Jan 6, 2014, 3:20:15 PM1/6/14
to pytho...@python.org
06.01.14 06:51, Chris Angelico написав(ла):
>>>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to turn a hex dump into a bytes literal?

>>> bytes.fromhex('43 6c 67 75 62 61')
b'Clguba'


Serhiy Storchaka

unread,
Jan 6, 2014, 3:21:45 PM1/6/14
to pytho...@python.org
06.01.14 06:41, Tim Chase написав(ла):
>>>> from codecs import getencoder
>>>> getencoder("rot-13")(s2.decode('utf-8'))[0]
> 'Python'

codecs.decode('rot13', s2.decode())


Serhiy Storchaka

unread,
Jan 6, 2014, 3:31:23 PM1/6/14
to pytho...@python.org
06.01.14 15:44, Mark Lawrence написав(ла):
> Simply scrap PEP 404 and the currently unhappy customers will be happy
> as they'll be free to do all the work they want on Python 2.8, as my
> understanding is that the vast majority of the Python core developers
> won't do it for them.

It's not necessary. You are free to make a fork and call it Qython 2.8.

Antoine Pitrou

unread,
Jan 6, 2014, 3:32:50 PM1/6/14
to pytho...@python.org
Chris Angelico <rosuav <at> gmail.com> writes:
>
> On Tue, Jan 7, 2014 at 3:29 AM, Antoine Pitrou <solipsis <at> pitrou.net>
Agreed, but it's enough to rebut the claim that "people don't use
Python 3". More than one million Python 3.3 downloads per month under
Windows is a very respectable number (no 2.x release seems to reach
that level).

Regards

Antoine.


Tim Chase

unread,
Jan 6, 2014, 3:42:18 PM1/6/14
to pytho...@python.org
On 2014-01-06 22:20, Serhiy Storchaka wrote:
> >>>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to
> >>>> turn a hex dump into a bytes literal?
>
> >>> bytes.fromhex('43 6c 67 75 62 61')
> b'Clguba'

Very nice new functionality in Py3k, but 2.x doesn't seem to have such
a method. :-(

-tkc



Mark Lawrence

unread,
Jan 6, 2014, 3:41:54 PM1/6/14
to pytho...@python.org
You plural is fine, you singular simply doesn't apply, it'll stop
raining in the UK first :)

Mark Lawrence

unread,
Jan 6, 2014, 3:47:11 PM1/6/14
to pytho...@python.org
Seems like another mistake, that'll have to be regressed to make sure
there is Python 2 and Python 3 compatibility, which can then be
reintroduced into Python 2.8 so that it gets back into Python 3.

Terry Reedy

unread,
Jan 6, 2014, 3:49:53 PM1/6/14
to pytho...@python.org
On 1/6/2014 8:44 AM, Mark Lawrence wrote:
> On 06/01/2014 12:39, Ned Batchelder wrote:
>>
>> I'm not talking about the technical details of bytes and Unicode. I'm
>> talking about making customers happy.
>>
>
> Simply scrap PEP 404

Not necessary.

> and the currently unhappy customers will be happy
> as they'll be free to do all the work they want on Python 2.8,

They are already free to do so, as long as they do not call the result
'Python 2.8'.

> as my
> understanding is that the vast majority of the Python core developers
> won't do it for them.

Which is what some of them want and why they will never be happy.



--
Terry Jan Reedy

Terry Reedy

unread,
Jan 6, 2014, 3:53:32 PM1/6/14
to pytho...@python.org
On 1/6/2014 10:10 AM, Ethan Furman wrote:

> The argument is that a very important, if small, subset a data
> manipulation become very painful in Py3. Not impossible, and not
> difficult, but painful because the mental model and the contortions
> needed to get things to work don't sync up anymore.

Thank you for a succinct summary. I presume you are referring in part by
bytes manipulations that would be easier with bytes.format. In
http://bugs.python.org/issue3982
Guido gave approval in principle to a minimal new method a year ago. The
proponents failed to build on that to get anything in 3.4. Finally,
however, Viktor Stinner has written a PEP
http://www.python.org/dev/peps/pep-0460/
so something might happen for 3.5.

--
Terry Jan Reedy

Mark Lawrence

unread,
Jan 6, 2014, 4:02:15 PM1/6/14
to pytho...@python.org
I find all this intriguing. People haven't found time to migrate from
Python 2 to Python 3, but now intend finding time to produce a fork of
Python 2 which will ease the migration to Python 3. Have I got that
correct?

Ned Batchelder

unread,
Jan 6, 2014, 4:14:51 PM1/6/14
to pytho...@python.org
On 1/6/14 2:30 PM, Mark Janssen wrote:
>>> Chris didn't say "bytes and ascii data", he said "bytes and TEXT".
>>> Text != "ascii data", and the fact that some people apparently think it
>>> does is pretty much the heart of the problem.
>>
>> The heart of a different problem, not this one. The problem I refer to is
>> that many binary formats have well-defined ascii-encoded text tidbits.
>
> Really? If people are using binary with "well-defined ascii-encoded
> tidbits", they're doing something wrong. Perhaps you think escape
> characters "\n" are "well defined tidbits", but YOU WOULD BE WRONG.
> The purpose of binary is to keep things raw. WTF? You guys are so
> strange.
>
>>
>>> If you (generic you) don't get that, you'll have a bad time. I mean
>>> *really*
>>> get it, deep down in the bone. The long, bad habit of thinking as
>>> ASCII-encoded bytes as text is the problem here.
>
> I think the whole forking community is confused at because of your own
> arrogance. Foo(l)s.
>
> markj
>

If you want to participate in this discussion, do so. Calling people
strange, arrogant, and fools with no technical content is just rude.
Typing "YOU WOULD BE WRONG" in all caps doesn't count as technical content.

--
Ned Batchelder, http://nedbatchelder.com

Gene Heskett

unread,
Jan 6, 2014, 4:17:06 PM1/6/14
to pytho...@python.org
On Monday 06 January 2014 16:16:13 Terry Reedy did opine:
Strings start a new thread here at nominally weekly intervals. Seems to me
that might be usable info.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

Tip the world over on its side and everything loose will land in Los
Angeles.
-- Frank Lloyd Wright
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.

Mark Janssen

unread,
Jan 6, 2014, 4:23:08 PM1/6/14
to Ned Batchelder, Python List
>> Really? If people are using binary with "well-defined ascii-encoded
>> tidbits", they're doing something wrong. Perhaps you think escape
>> characters "\n" are "well defined tidbits", but YOU WOULD BE WRONG.
>> The purpose of binary is to keep things raw. WTF?

> If you want to participate in this discussion, do so. Calling people
> strange, arrogant, and fools with no technical content is just rude. Typing
> "YOU WOULD BE WRONG" in all caps doesn't count as technical content.

Ned -- IF

Ned Batchelder

unread,
Jan 6, 2014, 4:32:01 PM1/6/14
to pytho...@python.org
On 1/6/14 12:50 PM, Steven D'Aprano wrote:
> Ned Batchelder wrote:
>
>> You are still talking about whether Armin is right, and whether he
>> writes well, about flaws in his statistics, etc. I'm talking about the
>> fact that an organization (Python core development) has a product
>> (Python 3) that is getting bad press. Popular and vocal customers
>> (Armin, Kenneth, and others) are unhappy. What is being done to make
>> them happy? Who is working with them? They are not unique, and their
>> viewpoints are not outliers.
>>
>> I'm not talking about the technical details of bytes and Unicode. I'm
>> talking about making customers happy.
>
> Oh? How much did Armin pay for his Python support? If he didn't pay, he's
> not a customer. He's a user.

I use the term "customer" in the larger sense of, "someone using your
product that you are trying to please." I'd like to think that an open
source project with only users would treat them as customers. Not in
the sense of a legal obligation in exchange for money, but in the sense
that the point of the work is to please them.

>
> When something gets bad press, the normal process is to first determine just
> how justified that bad press is. (Unless, of course, you're more interested
> in just *covering it up* than fixing the problem.) The best solutions are:
>
> - if the bad press is justified, admit it, and fix the problems;
>
> - if the bad press is not justified, try to educate Armin (and others) so
> they stop blaming Python for their own errors; try to counter their bad
> press with good press; or ignore it, knowing that the internet is
> notoriously fickle and in a week people will be hating on Go, or Ruby
> instead.
>
> But I get the idea from your post that you don't want to talk about the
> technical details of bytes and Unicode, and by extension, whether Python 3
> is better or worse than Python 2. That makes it impossible to determine how
> valid the bad press is, which leaves us hamstrung. Our only responses are:
>
> - Patronise him. "Yes yes, you poor little thing, we feel your pain. But
> what can we do about it?"
>
> - Abuse him and hope he shuts up.
>
> - Give in to his (and by extension, everyone elses) complaints, whether
> justified or not, and make Python worse.
>
> - Counter his bad press with good press, and come across as arrogant idiots
> by denying actual real problems (if any).
>
> - Wait for the Internet to move on.
>

I was only avoiding talking about Unicode vs bytes because I'm not the
one who needs a better way to do it, Armin and Kenneth are. You seem to
be arguing from the standpoint of, "I've never had problems, so there
are no problems."

I suspect an undercurrent here is also the difference between writing
Python 3 code, and writing code that can run on both Python 2 and 3.

In my original post, I provided two possible responses, one of which
you've omitted: work with Armin to explain the easier way that he has
missed. It sounds like you think there isn't an easier way, and that's
OK? I would love to see a Python 3 advocate work with Armin or Kenneth
on the code that's caused them such pain, and find a way to make it good.

It's clear from other discussions happening elsewhere that there is the
possibility of improving the situation, for example PEP 460 proposing
"bytes % args" and "bytes.format(args)". That's good.

Mark Janssen

unread,
Jan 6, 2014, 4:32:35 PM1/6/14
to Ned Batchelder, Python List
Ned -- IF YOU'RE A REAL PERSON -- you will see that several words
prior to that declaration, you'll find (or be able to arrange) the
proposition: "Escape characters are well-defined tidbits of binary
data is FALSE".

Now that is a technical point that i'm saying is simply the "way
things are" coming from the mass of experience held by the OS
community and the C programming community which is responsible for
much of the world's computer systems. Do you have an argument against
it, or do you piss off and argue against anything I say?? Perhaps I
said it too loudly, and I take responsibility for that, but don't
claim I'm not making a technical point which seems to be at the heart
of all the confusion regarding python/python3 and str/unicode/bytes.

mark

Mark Lawrence

unread,
Jan 6, 2014, 4:33:54 PM1/6/14
to pytho...@python.org
On 06/01/2014 21:17, Gene Heskett wrote:
> On Monday 06 January 2014 16:16:13 Terry Reedy did opine:
>
>> On 1/6/2014 9:32 AM, Gene Heskett wrote:
>>> And from my lurking here, its quite plain to me that 3.x python has a
>>> problem with everyday dealing with strings.
>>
>> Strings of what? And what specific 'everyday' problem are you referring
>> to?
>
> Strings start a new thread here at nominally weekly intervals. Seems to me
> that might be usable info.
>
> Cheers, Gene
>

That strikes me as being as useful as "The PEP 393 FSR is completely
wrong but I'm not going to tell you why" approach.

Ned Batchelder

unread,
Jan 6, 2014, 4:40:47 PM1/6/14
to pytho...@python.org
On 1/6/14 11:29 AM, Antoine Pitrou wrote:
> Ned Batchelder <ned <at> nedbatchelder.com> writes:
>>
>> You can look through his problems and decide that he's "wrong," or that
>> he's "ranting," but that doesn't change the fact that Python 3 is
>> encountering friction. What happens when a significant fraction of your
>> customers are "wrong"?
>
> Well, yes, there is some friction and this is quite expectable, when
> shipping incompatible changes. Other pieces of software have undergone a
> similar process (e.g. Apache 1.x -> Apache 2.x).
>
> (the alternative is to maintain a piece of software that sticks with obsolete
> conventions, e.g. emacs)
>
>> Core developers: I thank you for the countless hours you have devoted to
>> building all of the versions of Python. I'm sure in many ways it's a
>> thankless task. But you have a problem. What's the point in being
>> right if you end up with a product that people don't use?
>
> People don't use? According to available figures, there are more downloads of
> Python 3 than downloads of Python 2 (Windows installers, mostly):
> http://www.python.org/webstats/
>
> The number of Python 3-compatible packages has been showing a constant and
> healthy increase for years:
> http://dev.pocoo.org/~gbrandl/py3.html
>
> And Dan's survey shows 77% of respondents think Python 3 wasn't a mistake:
> https://wiki.python.org/moin/2.x-vs-3.x-survey
>
>> Maybe there are core developers who are trying hard to solve the
>> problems Kenneth and Armin are facing. It would be great if that work
>> was more visible. I don't see it, and apparently Armin doesn't either.
>
> While this is being discussed:
> https://mail.python.org/pipermail/python-dev/2014-January/130923.html
>
> I would still point out that "Kenneth and Armin" are not the whole Python
> community.

I never said they were the whole community, of course. But they are not
outliers either. By your own statistics above, 23% of respondents think
Python 3 was a mistake. Armin and Kenneth are just two very visible people.

> Your whole argument seems to be that a couple "revered" (!!)
> individuals should see their complaints taken for granted. I am opposed to
> rockstarizing the community.

I'm not creating rock stars. I'm acknowledging that these two people
are listened to by many others. It sounds like part of your effort to
avoid rockstars is to ignore any one person's specific feedback? I must
be misunderstanding what you mean.

>
> Their contribution is always welcome, of course.
>
> (as for network programming, the people working on and with asyncio don't
> seem to find Python 3 terrible)

Some people don't have problems. That doesn't mean that other people
don't have problems.

You are being given detailed specific feedback from intelligent
dedicated customers that many people listen to, and who are building
important components of the ecosystem, and your response is, "sorry, you
are wrong, it will be fine if I ignore you." That's disheartening.

>
> Regards
>
> Antoine.

Devin Jeanpierre

unread,
Jan 6, 2014, 4:43:06 PM1/6/14
to Mark Lawrence, comp.lang.python
On Mon, Jan 6, 2014 at 1:02 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> I find all this intriguing. People haven't found time to migrate from
> Python 2 to Python 3, but now intend finding time to produce a fork of
> Python 2 which will ease the migration to Python 3. Have I got that
> correct?

Keeping old, unsupported (by upstream) things up-to-date is a common
operation (e.g. this is what Red Hat does for an entire operating
system). It might take a few hours to backport a module or bugfix you
want, but updating an entire million-LOC codebase would take
significantly longer. Plus, if a benefit of backporting things is an
easier eventual migration to 3.x, it's killing two birds with one
stone.

At any rate it's not a possibility to sneer at and suggest is
improbable or a waste of time. It is a rational outcome for a codebase
of a large enough size.

-- Devin

Ned Batchelder

unread,
Jan 6, 2014, 4:42:01 PM1/6/14
to pytho...@python.org
On 1/6/14 4:33 PM, Mark Lawrence wrote:
> On 06/01/2014 21:17, Gene Heskett wrote:
>> On Monday 06 January 2014 16:16:13 Terry Reedy did opine:
>>
>>> On 1/6/2014 9:32 AM, Gene Heskett wrote:
>>>> And from my lurking here, its quite plain to me that 3.x python has a
>>>> problem with everyday dealing with strings.
>>>
>>> Strings of what? And what specific 'everyday' problem are you referring
>>> to?
>>
>> Strings start a new thread here at nominally weekly intervals. Seems
>> to me
>> that might be usable info.
>>
>> Cheers, Gene
>>
>
> That strikes me as being as useful as "The PEP 393 FSR is completely
> wrong but I'm not going to tell you why" approach.
>

Please stop baiting people.

Terry Reedy

unread,
Jan 6, 2014, 5:07:16 PM1/6/14
to pytho...@python.org
On 1/6/2014 7:39 AM, Ned Batchelder wrote:

> You are still talking about whether Armin is right, and whether he
> writes well, about flaws in his statistics, etc.

That is how *I* decide whether someone is worth attending to. He failed.

> I'm talking about the fact that an organization

of volunteers

> (Python core development) has a product

given away for free, with a liberal license that allows derivative products

> (Python 3) that is getting bad press.

Inevitable and nothing new.

> (Armin, Kenneth, and others) are unhappy.

There are many unhappy people in the world. Some will be unhappy no
matter what.

> What is being done to make them happy?

Huh? What are they doing to make core developers happy?

>Who is working with them?

You? Really the wrong question. Which of 'them' is working with us -- in
a respectful manner -- through established means? (See my response to
Ethan about what 'unhappy customers' failed to do for a year.)

> I'm talking about making customers happy.

Python has 'customers' around the world. I am more am more concerned
with helping poor kids in Asia, Africa, and Latin America than with
well-off professional developers in Latin-alphabet regions.

A certain person is unhappy with a feature of 3.3+. When we fixed the
first ostensible problem he identified, without his help, he found other
reasons to be unhappy with the feature. When we voluntarily fix more of
the ostensible problems with Python 3, which we will, without help from
most of the 'unhappy customers', I expect that some of them will also
continue to be unhappy customers. Some of them are opposed to the
fundamental changes in Python 3 and will never be happy with it.

--
Terry Jan Reedy

Mark Lawrence

unread,
Jan 6, 2014, 5:08:27 PM1/6/14
to pytho...@python.org
What are you on about? The comment has been made that "its quite plain
to me that 3.x python has a problem with everyday dealing with strings".
Myself, Terry Reedy and Steven D'Aprano have all commented on this,
asking for more data. We've been given nothing, which is precisely what
our resident unicode expert has given us.

Antoine Pitrou

unread,
Jan 6, 2014, 5:16:22 PM1/6/14
to pytho...@python.org
Ned Batchelder <ned <at> nedbatchelder.com> writes:
>
>
> I never said they were the whole community, of course. But they are not
> outliers either. By your own statistics above, 23% of respondents think
> Python 3 was a mistake. Armin and Kenneth are just two very visible
> people.

Indeed, they are two very visible people.

> I'm not creating rock stars. I'm acknowledging that these two people
> are listened to by many others. It sounds like part of your effort to
> avoid rockstars is to ignore any one person's specific feedback? I must
> be misunderstanding what you mean.

I am not trying to ignore "any one person's specific feedback". I am
ignoring your claim that we should give Armin's blog posts an
extraordinary importance because he is "revered".

Speaking of which, posting blog articles is not the preferred way to
give feedback. There are ample community resources for that. I am
irritated that we are apparently supposed to be monitoring blog posts,
Twitter feeds and whatnot for any sign of dissent, and immediately react
to a criticism that wasn't even voiced directly to us.

> You are being given detailed specific feedback from intelligent
> dedicated customers that many people listen to,

Could you please stop talking about customers? We are not selling
Python to anyone (*). Writing open source software as a volunteer is
not supposed to be a sacrificial activity where we will bow with
extreme diligence to the community's every outburst. Please try to
respect us.

((*) Wikipedia: "A customer (sometimes known as a client, buyer, or
purchaser) is the recipient of a good, service, product, or idea,
obtained from a seller, vendor, or supplier for a monetary or other
valuable consideration")

Regards

Antoine.


Ned Batchelder

unread,
Jan 6, 2014, 5:22:34 PM1/6/14
to pytho...@python.org
On 1/6/14 5:08 PM, Mark Lawrence wrote:
> On 06/01/2014 21:42, Ned Batchelder wrote:
>> On 1/6/14 4:33 PM, Mark Lawrence wrote:
>>> On 06/01/2014 21:17, Gene Heskett wrote:
>>>> On Monday 06 January 2014 16:16:13 Terry Reedy did opine:
>>>>
>>>>> On 1/6/2014 9:32 AM, Gene Heskett wrote:
>>>>>> And from my lurking here, its quite plain to me that 3.x python has a
>>>>>> problem with everyday dealing with strings.
>>>>>
>>>>> Strings of what? And what specific 'everyday' problem are you
>>>>> referring
>>>>> to?
>>>>
>>>> Strings start a new thread here at nominally weekly intervals. Seems
>>>> to me
>>>> that might be usable info.
>>>>
>>>> Cheers, Gene
>>>>
>>>
>>> That strikes me as being as useful as "The PEP 393 FSR is completely
>>> wrong but I'm not going to tell you why" approach.
>>>
>>
>> Please stop baiting people.
>>
>
> What are you on about? The comment has been made that "its quite plain
> to me that 3.x python has a problem with everyday dealing with strings".
> Myself, Terry Reedy and Steven D'Aprano have all commented on this,
> asking for more data. We've been given nothing, which is precisely what
> our resident unicode expert has given us.
>

I'm on about your comment being a gratuitous jab at someone who isn't
even participating in the thread. Stop it.

Ned Batchelder

unread,
Jan 6, 2014, 5:25:15 PM1/6/14
to pytho...@python.org
I do respect you, and all the core developers. As I've said elsewhere
in the thread, I greatly appreciate everything you do. I dedicate a
great deal of time and energy to the Python community, primarily because
of the amazing product that you have all built.

I've made my point as best as I can, I'll stop now.

>
> ((*) Wikipedia: "A customer (sometimes known as a client, buyer, or
> purchaser) is the recipient of a good, service, product, or idea,
> obtained from a seller, vendor, or supplier for a monetary or other
> valuable consideration")
>
> Regards
>
> Antoine.
>
>


Mark Lawrence

unread,
Jan 6, 2014, 5:30:14 PM1/6/14
to pytho...@python.org
You arrogance really has no bounds. If you'd have done the job that you
should have done in the first place and stopped that blithering idiot 16
months ago, we wouldn't still be putting up with him now. To top that,
you're now defending "customers" when you should be saying quite clearly
that PEP 404 stands as is and THERE WILL BE NO PYTHON 2.8. Have I made
my message perfectly clear?

And as I started this thread, I'll say what I please, throwing my toys
out of my pram in just the same way that your pal Armin is currently doing.

Antoine Pitrou

unread,
Jan 6, 2014, 5:35:02 PM1/6/14
to pytho...@python.org

Mark Lawrence <breamoreboy <at> yahoo.co.uk> writes:
> [...]
>
> And as I started this thread, I'll say what I please, throwing my toys
> out of my pram in just the same way that your pal Armin is currently doing.

I'll join Ned here: please stop it. You are doing a disservice to
everyone.

Thanks in advance

Antoine.


Nicholas Cole

unread,
Jan 6, 2014, 5:41:26 PM1/6/14
to Python
I hardly know which of the various threads on this topic to reply to!

No one is taking Python 2.7 away from anyone.  It is going to be on the net for years to come.  Goodness! I expect if I wanted to go and download Python 1.5 I could find it easily enough.

Like everyone else, when Python 3 came out I was nervous.  A lot of my code broke - but it broke for a good reason.  I had been being cavalier about strings and ASCII and bytes.  A lot of my code was working by accident rather than by design, or because my users had never fed it anything that would make it fall over.  Of course, my first reaction was a defensive one, but once I had got over that and got my head around Python 3's view of the world, I was pleased I had.  I find writing in Python 3 leads to more robust code.  I like the way it forces me to do the right thing, and I like the way it raises errors if I try to get away with something I shouldn't. Going back to Python 2 now feels a bit like stepping back to the seductive and permissive hell of PHP in some ways!  If I could be sure that I was coding just for me and not having to support things still running on Python 2, I would move to Python 3.3 and not look back.  Except, yes, there are still libraries that haven't made the change....blast!

Python 2.7 is there if your software was written to run on the 2 series.  I am sure it will either be distributed with (as default or option) major operating systems for some time.  I am totally unpersuaded by the argument that 'back porting' more and more into Python 2 will ease the transition.  I think it will just use up developer time, and delay further the day when releasing new code for Python 3 only becomes not only reasonable but the natural and default choice.

I am really glad to see that at least one distribution of Linux is moving to Python 3 as the default.  I'd much rather see developer time spent improving Python 3 than managing a transition.
  
I realised when Python 3.0 came out that eventually I would have to move to Python 3.  I spent the next release in a state of denial.  But I had years to get used to it, and I'm glad I have.  It "feels" more robust.  Of course, I haven't ported every little program: but no one is forcing me too!  

All of these threads are written as if everyone's code is about to be broken.  It isn't.  But if you want the new features, you need to make a move, and it is probably time to write all new code in Python 3. If there's a dependency holding you back, then there will be a Python 2 interpreter around to run your code.  That all seems pretty reasonable and straightforward to me.

Nicholas

Terry Reedy

unread,
Jan 6, 2014, 5:53:55 PM1/6/14
to pytho...@python.org
On 1/6/2014 11:29 AM, Antoine Pitrou wrote:

> People don't use? According to available figures, there are more downloads of
> Python 3 than downloads of Python 2 (Windows installers, mostly):
> http://www.python.org/webstats/

While I would like the claim to be true, I do not see 2 versus 3
downloads on that page. Did you mean another link?

> The number of Python 3-compatible packages has been showing a constant and
> healthy increase for years:
> http://dev.pocoo.org/~gbrandl/py3.html

This looks like the beginning of a sigmoid adoption curve. I do not
expect to see a peak of 100% unless and until old, dead, Python 2 only
projects get purged (in some future decade, if ever).

--
Terry Jan Reedy

Mark Janssen

unread,
Jan 6, 2014, 5:56:10 PM1/6/14
to Ned Batchelder, Python List
>> I would still point out that "Kenneth and Armin" are not the whole Python
>> community.
>
> I never said they were the whole community, of course. But they are not
> outliers either. [...]
>
>> Your whole argument seems to be that a couple "revered" (!!)
>> individuals should see their complaints taken for granted. I am opposed to
>> rockstarizing the community.
>
> I'm not creating rock stars. I'm acknowledging that these two people are
> listened to by many others. It sounds like part of your effort to avoid
> rockstars is to ignore any one person's specific feedback? I must be
> misunderstanding what you mean.

To Ned's defense, it doesn't always work to treat everyone in the
community as equal. That's not to say that those two examples are the
most important, but some people work on core aspects of the field
which are critical for everything else to work properly. Without
diving into it, one can't say whether Ned's intuition is wrong or not.

markj

Mark Lawrence

unread,
Jan 6, 2014, 5:57:45 PM1/6/14
to pytho...@python.org
I will for you as I have great respect for the amount of work that I've
seen you do over the years that I've been using Python.

Chris Angelico

unread,
Jan 6, 2014, 6:03:27 PM1/6/14
to Python List
On Tue, Jan 7, 2014 at 8:32 AM, Mark Janssen <dreamin...@gmail.com> wrote:
>>> Really? If people are using binary with "well-defined ascii-encoded
>>> tidbits", they're doing something wrong. Perhaps you think escape
>>> characters "\n" are "well defined tidbits", but YOU WOULD BE WRONG.
>>> The purpose of binary is to keep things raw. WTF?
>>
>> If you want to participate in this discussion, do so. Calling people
>> strange, arrogant, and fools with no technical content is just rude. Typing
>> "YOU WOULD BE WRONG" in all caps doesn't count as technical content.
>
> Ned -- [chomp verbiage]

Mark, please watch your citations. Several (all?) of your posts in
this thread have omitted the line(s) at the top saying who you're
quoting. Have a look at my post here, and then imagine how confused
Mark Lawrence would be if I hadn't made it clear that I wasn't
addressing him.

Thanks!

ChrisA

Mark Lawrence

unread,
Jan 6, 2014, 6:02:05 PM1/6/14
to pytho...@python.org
The first sentence from the blog which gives this thread its title "It's
becoming increasingly harder to have reasonable discussions about the
differences between Python 2 and 3 because one language is dead and the
other is actively developed". Funny really as I see bug fixes going
into Python 2.7 on a daily basis so I can only assume that their
definition of dead is different to mine and presumably yours.

Chris Angelico

unread,
Jan 6, 2014, 6:06:01 PM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 7:42 AM, Tim Chase <pytho...@tim.thechases.com> wrote:
> On 2014-01-06 22:20, Serhiy Storchaka wrote:
>> >>>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to
>> >>>> turn a hex dump into a bytes literal?
>>
>> >>> bytes.fromhex('43 6c 67 75 62 61')
>> b'Clguba'
>
> Very nice new functionality in Py3k, but 2.x doesn't seem to have such
> a method. :-(

Thanks, Serhiy. Very nice new functionality indeed, and not having it
in 2.x isn't a problem to me. That's exactly what I was looking for -
it doesn't insist on (or complain about) separators between bytes.
(Though the error from putting a space _inside_ a byte is a little
confusing. But that's trivial.)

ChrisA

Antoine Pitrou

unread,
Jan 6, 2014, 6:06:57 PM1/6/14
to pytho...@python.org
Terry Reedy <tjreedy <at> udel.edu> writes:
>
> On 1/6/2014 11:29 AM, Antoine Pitrou wrote:
>
> > People don't use? According to available figures, there are more
downloads of
> > Python 3 than downloads of Python 2 (Windows installers, mostly):
> > http://www.python.org/webstats/
>
> While I would like the claim to be true, I do not see 2 versus 3
> downloads on that page. Did you mean another link?

Just click on a recent month, scroll down to the "Total URLs By kB"
table, and compute the sum of the largest numbers for each Python
version.

Regards

Antoine.


Ben Finney

unread,
Jan 6, 2014, 6:14:55 PM1/6/14
to pytho...@python.org
Mark Lawrence <bream...@yahoo.co.uk> writes:

> You arrogance really has no bounds. If you'd have done the job that
> you should have done in the first place and stopped that blithering
> idiot 16 months ago, we wouldn't still be putting up with him now.

That is a misdirection; Ned's request that you stop bad behaviour
(baiting known trolls into responding on a thread) is unrelated to
Armin's bad behaviour. Don't use the bad behaviour of others to justify
yours.

> And as I started this thread, I'll say what I please, throwing my toys
> out of my pram in just the same way that your pal Armin is currently
> doing.

We expect everyone to behave well, regardless of whether they start a
thread or not. Don't use the bad behaviour of others to justify yours.

We know you're better than this, Mark. Please stop.

--
\ “To have the choice between proprietary software packages, is |
`\ being able to choose your master. Freedom means not having a |
_o__) master.” —Richard M. Stallman, 2007-05-16 |
Ben Finney

Chris Angelico

unread,
Jan 6, 2014, 6:15:23 PM1/6/14
to pytho...@python.org
On Tue, Jan 7, 2014 at 7:32 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Chris Angelico <rosuav <at> gmail.com> writes:
>>
>> On Tue, Jan 7, 2014 at 3:29 AM, Antoine Pitrou <solipsis <at> pitrou.net>
> wrote:
>> > People don't use? According to available figures, there are more
> downloads of
>> > Python 3 than downloads of Python 2 (Windows installers, mostly):
>> > http://www.python.org/webstats/
>> >
>>
>> Unfortunately, that has a massive inherent bias, because there are
>> Python builds available in most Linux distributions - and stats from
>> those (like Debian's popcon) will be nearly as useless, because a lot
>> of them will install one or the other (probably 2.x) without waiting
>> for the user (so either they'll skew in favour of the one installed,
>> or in favour of the one NOT installed, because that's the only one
>> that'll be explicitly requested). It's probably fairly accurate for
>> Windows stats, though, since most people who want Python on Windows
>> are going to come to python.org for an installer.
>
> Agreed, but it's enough to rebut the claim that "people don't use
> Python 3". More than one million Python 3.3 downloads per month under
> Windows is a very respectable number (no 2.x release seems to reach
> that level).

Sure. The absolute number is useful; I just don't think the relative
number is - you started by talking about there being "more downloads
of Python 3 than downloads of Python 2", and it's that comparison that
I think is unfair. But the absolute numbers are definitely
significant. I'm not quite sure how to interpret the non-link lines in
[1] but I see the month of December showing roughly 1.2 million Python
3.3.3 downloads for Windows - interestingly, split almost fifty-fifty
between 64-bit and 32-bit installs - and just over one million Python
2.7.6 installs. That's one month, two million installations of Python.
That's 73,021 *per day* for the month of December. That is a lot of
Python.

ChrisA

[1] http://www.python.org/webstats/usage_201312.html#TOPURLS
It is loading more messages.
0 new messages