Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Blog "about python 3"

353 views
Skip to first unread message

Mark Lawrence

unread,
Dec 30, 2013, 2:41:44 PM12/30/13
to pytho...@python.org
http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
some of you.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Steven D'Aprano

unread,
Dec 30, 2013, 3:49:16 PM12/30/13
to
On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:

> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
> some of you.

I don't know whether to thank you for the link, or shout at you for
sending eyeballs to look at such a pile of steaming bullshit.

I'd like to know where Alex gets the idea that the transition of Python 2
to 3 was supposed to be a five year plan. As far as I know, it was a ten
year plan, and we're well ahead of expectations of where we would be at
this point of time. People *are* using Python 3, the major Linux distros
are planning to move to Python 3, the "Python Wall Of Shame" stopped
being a wall of shame a long time ago (I think it was a year ago? or at
least six months ago). Alex's article is, basically, FUD.

More comments will have to follow later.


--
Steven

Mark Lawrence

unread,
Dec 30, 2013, 4:29:00 PM12/30/13
to pytho...@python.org

Ethan Furman

unread,
Dec 30, 2013, 5:38:36 PM12/30/13
to pytho...@python.org
On 12/30/2013 01:29 PM, Mark Lawrence wrote:
> On 30/12/2013 20:49, Steven D'Aprano wrote:
>> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>>
>>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>>> some of you.
>>
>> I don't know whether to thank you for the link, or shout at you for
>> sending eyeballs to look at such a pile of steaming bullshit.
>
Wow -- another steaming pile! Mark, are you going for a record? ;)

--
~Ethan~

Chris Angelico

unread,
Dec 30, 2013, 8:09:39 PM12/30/13
to pytho...@python.org
On Tue, Dec 31, 2013 at 9:38 AM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 12/30/2013 01:29 PM, Mark Lawrence wrote:
>>
>> On 30/12/2013 20:49, Steven D'Aprano wrote:
>>>
>>> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>>>
>>>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>>>> some of you.
>>>
>>>
>>> I don't know whether to thank you for the link, or shout at you for
>>> sending eyeballs to look at such a pile of steaming bullshit.
>>
>>
>> http://nuitka.net/posts/re-about-python-3.html is a response.
>
>
> Wow -- another steaming pile! Mark, are you going for a record? ;)

Does this steam?

http://rosuav.blogspot.com/2013/12/about-python-3-response.html

ChrisA

Mark Lawrence

unread,
Dec 30, 2013, 11:38:52 PM12/30/13
to pytho...@python.org
On 30/12/2013 22:38, Ethan Furman wrote:
> On 12/30/2013 01:29 PM, Mark Lawrence wrote:
>> On 30/12/2013 20:49, Steven D'Aprano wrote:
>>> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>>>
>>>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>>>> some of you.
>>>
>>> I don't know whether to thank you for the link, or shout at you for
>>> sending eyeballs to look at such a pile of steaming bullshit.
>>
>> http://nuitka.net/posts/re-about-python-3.html is a response.
>
> Wow -- another steaming pile! Mark, are you going for a record? ;)
>
> --
> ~Ethan~

Merely pointing out the existence of these little gems in order to find
out people's feelings about them. You never know, we might even end up
with a thread whereby the discussion is Python, the whole Python and
nothing but the Python.

Chris Angelico

unread,
Dec 30, 2013, 11:44:32 PM12/30/13
to pytho...@python.org
On Tue, Dec 31, 2013 at 3:38 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> You never know, we might even end up with a thread whereby the discussion is
> Python, the whole Python and nothing but the Python.

What, on python-list??! [1] That would be a silly idea. We should
avoid such theories with all vigor.

ChrisA

[1] In C, that would be interpreted as "What, on python-list|" and
would confuse everyone.

Ethan Furman

unread,
Dec 30, 2013, 11:33:25 PM12/30/13
to comp.lang.python
On 12/30/2013 08:25 PM, Devin Jeanpierre wrote:
> On Mon, Dec 30, 2013 at 2:38 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>> Wow -- another steaming pile! Mark, are you going for a record? ;)
>
> Indeed. Every post that disagrees with my opinion and understanding of
> the situation is complete BS and a conspiracy to spread fear,
> uncertainty, and doubt. Henceforth I will explain few to no specific
> disagreements, nor will I give anyone the benefit of the doubt,
> because that would be silly.

Couldn't of said it better myself! Well, except for the "my opinion" part -- obviously it's not my opinion, but
reality! ;)

--
~Ethan~

Mark Lawrence

unread,
Dec 30, 2013, 11:59:35 PM12/30/13
to pytho...@python.org
On 31/12/2013 01:09, Chris Angelico wrote:
> On Tue, Dec 31, 2013 at 9:38 AM, Ethan Furman <et...@stoneleaf.us> wrote:
>> On 12/30/2013 01:29 PM, Mark Lawrence wrote:
>>>
>>> On 30/12/2013 20:49, Steven D'Aprano wrote:
>>>>
>>>> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>>>>
>>>>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>>>>> some of you.
>>>>
>>>>
>>>> I don't know whether to thank you for the link, or shout at you for
>>>> sending eyeballs to look at such a pile of steaming bullshit.
>>>
>>>
>> Wow -- another steaming pile! Mark, are you going for a record? ;)
>
I'd have said restrained.

Mark Lawrence

unread,
Dec 31, 2013, 3:22:23 AM12/31/13
to pytho...@python.org
On 30/12/2013 22:38, Ethan Furman wrote:
> On 12/30/2013 01:29 PM, Mark Lawrence wrote:
>> On 30/12/2013 20:49, Steven D'Aprano wrote:
>>> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>>>
>>>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>>>> some of you.
>>>
>>> I don't know whether to thank you for the link, or shout at you for
>>> sending eyeballs to look at such a pile of steaming bullshit.
>>
>> http://nuitka.net/posts/re-about-python-3.html is a response.
>
> Wow -- another steaming pile! Mark, are you going for a record? ;)
>
> --
> ~Ethan~

I wasn't, but I am now
http://blog.startifact.com/posts/alex-gaynor-on-python-3.html. "The
Python core developers somewhat gleefully slammed the door shut on
Python 2.8 back in 2011, though.", which refers to PEP 404 which I
mentioned a month or so ago.

Steven D'Aprano

unread,
Dec 31, 2013, 4:04:14 AM12/31/13
to
Steven D'Aprano wrote:

> On Mon, 30 Dec 2013 19:41:44 +0000, Mark Lawrence wrote:
>
>> http://alexgaynor.net/2013/dec/30/about-python-3/ may be of interest to
>> some of you.
[...]
> I'd like to know where Alex gets the idea that the transition of Python 2
> to 3 was supposed to be a five year plan. As far as I know, it was a ten
> year plan,

I haven't been able to find anything in writing from Guido or the core
developers stating that the transition period was expected to be ten years,
although I haven't looked very hard. I strongly recall it being discussed,
so unless you want to trawl the python-dev mailing list, you'll just have
to take my word on it *wink*

PEP 3000 makes it clear that Guido van Rossum expected the transition period
to be longer than usual:

[quote]
I expect that there will be parallel Python 2.x and 3.x releases
for some time; the Python 2.x releases will continue for a longer
time than the traditional 2.x.y bugfix releases. Typically, we
stop releasing bugfix versions for 2.x once version 2.(x+1) has
been released. But I expect there to be at least one or two new 2.x
releases even after 3.0 (final) has been released, probably well
into 3.1 or 3.2.
[end quote]

http://www.python.org/dev/peps/pep-3000/

A five year transition period, as suggested by Alex Gaynor, simply makes no
sense. Normal support for a single release is four or five years, e.g.
Python 2.4 and 2.5 release schedules:

* Python 2.4 alpha-1 was released on July 9 2004; the final security
update was December 19 2008;

* Python 2.5 alpha-1 was released on April 5 2006; the final security
update was May 26 2011.

(Dates may be approximate, especially the alpha dates. I'm taking them from
PEP 320 and 356.)

Now in fairness, Guido's comment about "well into 3.1 or 3.2" turned out to
be rather naive in retrospect. 3.4 alpha has been released, and support for
2.7 is expected to continue for *at least* two more years:

http://www.python.org/dev/peps/pep-0373/#maintenance-releases

which means that 2.7 probably won't become unmaintained until 3.5 is out. In
hindsight, this is probably a good thing. The early releases of 3.x made a
few mistakes, and it's best to skip them and go straight to 3.3 or better,
for example:

- 3.0 was buggy enough that support for it was dropped almost immediately;

- built-in function callable() is missing from 3.1;

- 3.1 and 3.2 both have exception chaining, but there's no way to
suppress the chained exceptions until 3.3;

- 3.1 and 3.2 don't allow u'' strings for compatibility with 2.x.


The 2.8 un-release schedule goes into more detail about the transition, and
why there won't be an official 2.8 blessed by the core developers:

http://www.python.org/dev/peps/pep-0404/

(Python is open source -- nothing is stopping people from forking the
language or picking up support for 2.7. I wonder how many Python3 naysayers
volunteer to support 2.x once the core devs drop it?)

As of June this year, over 75% of the top fifty projects hosted on PyPI
supported Python 3:

http://py3ksupport.appspot.com/

and the Python Wall Of Shame turned majority green, becoming the Wall Of
Superpowers, at least six months ago. (Back in June, I noted that it had
changed colour some time ago.) Alex's claim that "almost no code is written
for Python 3" is, well, I'll be kind and describe it as counter-factual.

Alex points out that the companies he is speaking to have no plans to
migrate to Python 3. Well, duh. In my experience, most companies don't even
begin planning to migrate until six months *after* support has ended for
the systems they rely on. (Perhaps a tiny exaggeration, but not much.)

I won't speak for the Windows or Mac world, but in the Linux world, Python 3
usage depends on the main Linux distros. Yes, ArchLinux has been Python 3
for years now, but ArchLinux is bleeding edge. Fedora is likely to be the
first mainstream distro to move to Python 3:

https://fedoraproject.org/wiki/Changes/Python_3_as_Default

Once Fedora moves, I expect Ubuntu will follow. Debian, Centos and RedHat
will probably be more conservative, but they *will* follow, at their own
pace. What are the alternatives? They're not going to drop Python, nor are
they going to take over support of 2.x forever. (RedHat/Centos are still
supporting Python 2.4 and possibly even 2.3, at least in name, but I
haven't seen any security updates come through Centos for a long time.)
Once the system Python is Python 3, the default, no-brainer choice for most
Python coding will be Python 3.

Alex says:

[quote]
Why aren't people using Python 3?

First, I think it's because of a lack of urgency. Many years ago,
before I knew how to program, the decision to have Python 3
releases live in parallel to Python 2 releases was made. In
retrospect this was a mistake, it resulted in a complete lack
of urgency for the community to move, and the lack of urgency
has given way to lethargy.
[end quote]


That's only a mistake if you think that migrating to Python 3 needs to be
*urgent*. But it isn't. It's actually a *good thing* if people and
companies can take their sweet time migrating to Python 3. Let's jump back
five years ago. Had Guido announced that Python 2.6 would be followed by
Python 3.0, with little in the way of a transition, people would have been
faced with three unpalatable choices:

1) dig your heels in and stay with Python 2.6 without upstream support or
security updates;

2) migrate to an essentially untested, backwards-incompatible version of
Python which is likely to be buggy (as indeed Python 3.0 turned out
to be);

3) or migrate to some other language.


None of these are good choices, but the difference between 2 and 3 would
have been small: with none of the big frameworks and libraries like numpy
or Zope migrated to Python 3, moving to a completely new language would
have seemed like a reasonable choice. Choose between getting stuck half-way
through a transition because only some of the libraries you need have been
ported, versus move to a new language where you know (or at least *think*
you know) the risks?

Rushing the transition like Alex suggests would have killed Python.

Instead, companies will move when they're ready. Yes, there will be a
deadline, probably in 2017 or 2018, but in the meantime early adopters get
to file back the sharp corners and Python 3.x gets many years to iron out
the bugs. Back when Python 3 was just started, nobody thought that it would
be plausible to have a single code base handle 2.x and 3.x. It turns out
that for many projects, that's actually easier than trying to migrate code
with 2to3. That's the sort of thing which early adopters discovered.
Slow-coaches will find it easier later on because we've had so many years
for the transition.

Alex thinks that the reason people haven't migrated to 3.x is because of a
lack of urgency. And his solution to this lack of urgency is to *reduce*
the level of urgency by another 4 or 5 years by adding a version 2.8. And
no doubt when that fails to work, he'll suggest giving them another 4 or 5
years with version 2.9.

"Hurry up, we're waiting! If you don't hurry up, we'll wait longer!"

I don't think Alex has thought his plan through, unless his plan is actually
to kill Python 3.


--
Steven

Steven D'Aprano

unread,
Dec 31, 2013, 4:53:25 AM12/31/13
to
Mark Lawrence wrote:

> http://blog.startifact.com/posts/alex-gaynor-on-python-3.html.

I quote:

"...perhaps a brave group of volunteers will stand up and fork Python 2, and
take the incremental steps forward. This will have to remain just an idle
suggestion, as I'm not volunteering myself."

I expect that as excuses for not migrating get fewer, and the deadline for
Python 2.7 end-of-life starts to loom closer, more and more haters^W
Concerned People will whine about the lack of version 2.8 and ask for
*somebody else* to fork Python.

I find it, hmmm, interesting, that so many of these Concerned People who say
that they're worried about splitting the Python community[1] end up
suggesting that we *split the community* into those who have moved forward
to Python 3 and those who won't.





[1] As if the community is a single amorphous group. It is not. It is made
up of web developers using Zope or Django, and scientists using scipy, and
linguists using NLTK, and system administrators using nothing but the
stdlib, and school kids learning how to program, and professionals who know
seventeen different programming languages, and EVE Online gamers using
Stackless, and Java guys using Jython, and many more besides, most of whom
are sure that their little tiny part of the Python ecosystem is
representative of everyone else when in fact they hardly talk at all.

--
Steven

Devin Jeanpierre

unread,
Dec 30, 2013, 11:25:50 PM12/30/13
to Ethan Furman, comp.lang.python
On Mon, Dec 30, 2013 at 2:38 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> Wow -- another steaming pile! Mark, are you going for a record? ;)

Indeed. Every post that disagrees with my opinion and understanding of
the situation is complete BS and a conspiracy to spread fear,
uncertainty, and doubt. Henceforth I will explain few to no specific
disagreements, nor will I give anyone the benefit of the doubt,
because that would be silly.

-- Devin

Antoine Pitrou

unread,
Dec 31, 2013, 9:13:58 AM12/31/13
to pytho...@python.org
Steven D'Aprano <steve+comp.lang.python <at> pearwood.info> writes:
>
> I expect that as excuses for not migrating get fewer, and the deadline for
> Python 2.7 end-of-life starts to loom closer, more and more haters^W
> Concerned People will whine about the lack of version 2.8 and ask for
> *somebody else* to fork Python.
>
> I find it, hmmm, interesting, that so many of these Concerned People who say
> that they're worried about splitting the Python community[1] end up
> suggesting that we *split the community* into those who have moved forward
> to Python 3 and those who won't.

Indeed. This would be extremely destructive (not to mention alienating the
people doing *actual* maintenance and enhancements on Python-and-its-stdlib,
of which at least 95% are committed to the original plan for 3.x to slowly
supercede 2.x).

Regards

Antoine.


Roy Smith

unread,
Dec 31, 2013, 10:41:27 AM12/31/13
to
In article <mailman.4753.1388499...@python.org>,
I'm using 2.7 in production. I realize that at some point we'll need to
upgrade to 3.x. We'll keep putting that off as long as the "effort +
dependencies + risk" metric exceeds the "perceived added value" metric.

I can't imagine the migration will happen in 2014. Maybe not even in
2015. Beyond that, my crystal ball only shows darkness. But, in any
case, going with a fork of 2.x has zero appeal. Given the choice
between effort + risk to move forward vs. effort + risk to move
sideways, I'll move forward every time.

To be honest, the "perceived added value" in 3.x is pretty low for us.
What we're running now works. Switching to 3.x isn't going to increase
our monthly average users, or our retention rate, or decrease our COGS,
or increase our revenue. There's no killer features we need. In
summary, the decision to migrate will be driven more by risk aversion,
when the risk of staying on an obsolete, unsupported platform, exceeds
the risk of moving to a new one. Or, there will be some third-party
module that we must have which is no longer supported on 2.x.

If I were starting a new project today, I would probably start it in 3.x.

Chris Angelico

unread,
Dec 31, 2013, 10:54:32 AM12/31/13
to pytho...@python.org
On Wed, Jan 1, 2014 at 2:41 AM, Roy Smith <r...@panix.com> wrote:
> To be honest, the "perceived added value" in 3.x is pretty low for us.
> What we're running now works. Switching to 3.x isn't going to increase
> our monthly average users, or our retention rate, or decrease our COGS,
> or increase our revenue. There's no killer features we need. In
> summary, the decision to migrate will be driven more by risk aversion,
> when the risk of staying on an obsolete, unsupported platform, exceeds
> the risk of moving to a new one. Or, there will be some third-party
> module that we must have which is no longer supported on 2.x.

The biggest killer feature for most deployments is likely to be that
Unicode "just works" everywhere. Any new module added to Py3 can be
back-ported to Py2 (with some amount of work - might be trivial, might
be a huge job), and syntactic changes are seldom a "killer feature",
but being able to depend on *every single library function* working
perfectly with the full Unicode range means you don't have to test
every branch of your code.

If that's not going to draw you, then yeah, there's not a lot to
justify switching. You won't get more users, it'll increase your costs
(though by a fixed amount, not an ongoing cost), and old code is
trustworthy code, new code is bug city.

> If I were starting a new project today, I would probably start it in 3.x.

And that's the right attitude (though I would drop the "probably").
Eventually it'll become more critical to upgrade (once Py2 security
patches stop coming through, maybe), and when that day does finally
come, you'll be glad you have just your 2013 codebases rather than the
additional ones dating from 2014 and on until whatever day that is.
The past is Py2; the future is Py3. In between, use whichever one
makes better business sense.

ChrisA

Mark Lawrence

unread,
Dec 31, 2013, 10:55:40 AM12/31/13
to pytho...@python.org
On 31/12/2013 15:41, Roy Smith wrote:
> In article <mailman.4753.1388499...@python.org>,
> Antoine Pitrou <soli...@pitrou.net> wrote:
>
>> Steven D'Aprano <steve+comp.lang.python <at> pearwood.info> writes:
>>>
>>> I expect that as excuses for not migrating get fewer, and the deadline for
>>> Python 2.7 end-of-life starts to loom closer, more and more haters^W
>>> Concerned People will whine about the lack of version 2.8 and ask for
>>> *somebody else* to fork Python.
>>>
>>> I find it, hmmm, interesting, that so many of these Concerned People who say
>>> that they're worried about splitting the Python community[1] end up
>>> suggesting that we *split the community* into those who have moved forward
>>> to Python 3 and those who won't.
>>
>> Indeed. This would be extremely destructive (not to mention alienating the
>> people doing *actual* maintenance and enhancements on Python-and-its-stdlib,
>> of which at least 95% are committed to the original plan for 3.x to slowly
>> supercede 2.x).
>>
>> Regards
>>
>> Antoine.
>
> I'm using 2.7 in production. I realize that at some point we'll need to
> upgrade to 3.x. We'll keep putting that off as long as the "effort +
> dependencies + risk" metric exceeds the "perceived added value" metric.
>

Do you use any of the features that were backported from 3.x to 2.7, or
could you have stayed with 2.6 or an even older version?

Robin Becker

unread,
Jan 2, 2014, 12:36:58 PM1/2/14
to pytho...@python.org
On 31/12/2013 15:41, Roy Smith wrote:
> I'm using 2.7 in production. I realize that at some point we'll need to
> upgrade to 3.x. We'll keep putting that off as long as the "effort +
> dependencies + risk" metric exceeds the "perceived added value" metric.
>
We too are using python 2.4 - 2.7 in production. Different clients migrate at
different speeds.

>
> To be honest, the "perceived added value" in 3.x is pretty low for us.
> What we're running now works. Switching to 3.x isn't going to increase
> our monthly average users, or our retention rate, or decrease our COGS,
> or increase our revenue. There's no killer features we need. In
> summary, the decision to migrate will be driven more by risk aversion,
> when the risk of staying on an obsolete, unsupported platform, exceeds
> the risk of moving to a new one. Or, there will be some third-party
> module that we must have which is no longer supported on 2.x.
>

+1

> If I were starting a new project today, I would probably start it in 3.x.
+1

I just spent a large amount of effort porting reportlab to a version which works
with both python2.7 and python3.3. I have a large number of functions etc which
handle the conversions that differ between the two pythons.

For fairly sensible reasons we changed the internal default to use unicode
rather than bytes. After doing all that and making the tests compatible etc etc
I have a version which runs in both and passes all its tests. However, for
whatever reason the python 3.3 version runs slower

2.7 Ran 223 tests in 66.578s

3.3 Ran 223 tests in 75.703s

I know some of these tests are fairly variable, but even for simple things like
paragraph parsing 3.3 seems to be slower. Since both use unicode internally it
can't be that can it, or is python 2.7's unicode faster?

So far the superiority of 3.3 escapes me, but I'm tasked with enjoying this
process so I'm sure there must be some new 'feature' that will help. Perhaps
'yield from' or 'raise from None' or .......

In any case I think we will be maintaining python 2.x code for at least another
5 years; the version gap is then a real hindrance.
--
Robin Becker

David Hutto

unread,
Jan 2, 2014, 1:25:28 PM1/2/14
to Robin Becker, python-list
Just because it's 3.3 doesn't matter...the main interest is in compatibility. Secondly, you used just one piece of code, which could be a fluke, try others, and check the PEP. You need to realize that evebn the older versions are benig worked on, and they have to be refined. So if you have a problem, use the older and import from the future would be my suggestion



On Thu, Jan 2, 2014 at 12:36 PM, Robin Becker <ro...@reportlab.com> wrote:
On 31/12/2013 15:41, Roy Smith wrote:
I'm using 2.7 in production.  I realize that at some point we'll need to
upgrade to 3.x.  We'll keep putting that off as long as the "effort +
dependencies + risk" metric exceeds the "perceived added value" metric.

We too are using python 2.4 - 2.7 in production. Different clients migrate at different speeds.

To be honest, the "perceived added value" in 3.x is pretty low for us.
What we're running now works.  Switching to 3.x isn't going to increase
our monthly average users, or our retention rate, or decrease our COGS,
or increase our revenue.  There's no killer features we need.  In
summary, the decision to migrate will be driven more by risk aversion,
when the risk of staying on an obsolete, unsupported platform, exceeds
the risk of moving to a new one.  Or, there will be some third-party
module that we must have which is no longer supported on 2.x.


+1


If I were starting a new project today, I would probably start it in 3.x.
+1


I just spent a large amount of effort porting reportlab to a version which works with both python2.7 and python3.3. I have a large number of functions etc which handle the conversions that differ between the two pythons.

For fairly sensible reasons we changed the internal default to use unicode rather than bytes. After doing all that and making the tests compatible etc etc I have a version which runs in both and passes all its tests. However, for whatever reason the python 3.3 version runs slower

2.7 Ran 223 tests in 66.578s

3.3 Ran 223 tests in 75.703s

I know some of these tests are fairly variable, but even for simple things like paragraph parsing 3.3 seems to be slower. Since both use unicode internally it can't be that can it, or is python 2.7's unicode faster?

So far the superiority of 3.3 escapes me, but I'm tasked with enjoying this process so I'm sure there must be some new 'feature' that will help. Perhaps 'yield from' or 'raise from None' or .......

In any case I think we will be maintaining python 2.x code for at least another 5 years; the version gap is then a real hindrance.

Terry Reedy

unread,
Jan 2, 2014, 1:37:16 PM1/2/14
to pytho...@python.org
On 1/2/2014 12:36 PM, Robin Becker wrote:

> I just spent a large amount of effort porting reportlab to a version
> which works with both python2.7 and python3.3. I have a large number of
> functions etc which handle the conversions that differ between the two
> pythons.

I am imagine that this was not fun.

[For those who do not know, reportlab produces pdf documents.]

> For fairly sensible reasons we changed the internal default to use
> unicode rather than bytes.

Do you mean 'from __future__ import unicode_literals'?

Am I correct in thinking that this change increases the capabilities of
reportlab? For instance, easily producing an article with abstracts in
English, Arabic, Russian, and Chinese?

> After doing all that and making the tests
> compatible etc etc I have a version which runs in both and passes all
> its tests. However, for whatever reason the python 3.3 version runs slower.

Python 3 is slower in some things, like integer arithmetic with small ints.

> 2.7 Ran 223 tests in 66.578s
>
> 3.3 Ran 223 tests in 75.703s
>
> I know some of these tests are fairly variable, but even for simple
> things like paragraph parsing 3.3 seems to be slower. Since both use
> unicode internally it can't be that can it, or is python 2.7's unicode
> faster?

The new unicode implementation in 3.3 is faster for some operations and
slower for others. It is definitely more space efficient, especially
compared to a wide build system. It is definitely less buggy, especially
compared to a narrow build system.

Do your tests use any astral (non-BMP) chars? If so, do they pass on
narrow 2.7 builds (like on Windows)?

> So far the superiority of 3.3 escapes me,

For one thing, indexing and slicing just works on all machines for all
unicode strings. Code for 2.7 and 3.3 either a) does not index or slice,
b) does not work for all text on 2.7 narrow builds, or c) has extra
conditional code only for 2.7.

--
Terry Jan Reedy

Antoine Pitrou

unread,
Jan 2, 2014, 6:57:56 PM1/2/14
to pytho...@python.org

Hi,

Robin Becker <robin <at> reportlab.com> writes:
>
> For fairly sensible reasons we changed the internal default to use unicode
> rather than bytes. After doing all that and making the tests compatible
etc etc
> I have a version which runs in both and passes all its tests. However, for
> whatever reason the python 3.3 version runs slower
>
> 2.7 Ran 223 tests in 66.578s
>
> 3.3 Ran 223 tests in 75.703s

Running a test suite is a completely broken benchmarking methodology.
You should isolate workloads you are interested in and write a benchmark
simulating them.

Regards

Antoine.


Steven D'Aprano

unread,
Jan 2, 2014, 11:49:31 PM1/2/14
to
Robin Becker wrote:

> For fairly sensible reasons we changed the internal default to use unicode
> rather than bytes. After doing all that and making the tests compatible
> etc etc I have a version which runs in both and passes all its tests.
> However, for whatever reason the python 3.3 version runs slower

"For whatever reason" is right, unfortunately there's no real way to tell
from the limited information you give what that might be.

Are you comparing a 2.7 "wide" or "narrow" build? Do your tests use any
so-called "astral characters" (characters in the Supplementary Multilingual
Planes, i.e. characters with ord() > 0xFFFF)?

If I remember correctly, some early alpha(?) versions of Python 3.3
consistently ran Unicode operations a small but measurable amount slower
than 3.2 or 2.7. That especially effected Windows. But I understand that
this was sped up in the release version of 3.3.

There are some operations with Unicode strings in 3.3 which unavoidably are
slower. If you happen to hit a combination of such operations (mostly to do
with creating lots of new strings and then throwing them away without doing
much work) your code may turn out to be a bit slower. But that's a pretty
artificial set of code.

Generally, test code doesn't make good benchmarks. Tests only get run once,
in arbitrary order, it spends a lot of time setting up and tearing down
test instances, there are all sorts of confounding factors. This plays
merry hell with modern hardware optimizations. In addition, it's quite
possible that you're seeing some other slow down (the unittest module?) and
misinterpreting it as related to string handling. But without seeing your
entire code base and all the tests, who can say for sure?


> 2.7 Ran 223 tests in 66.578s
>
> 3.3 Ran 223 tests in 75.703s
>
> I know some of these tests are fairly variable, but even for simple things
> like paragraph parsing 3.3 seems to be slower. Since both use unicode
> internally it can't be that can it, or is python 2.7's unicode faster?

Faster in some circumstances, slower in others. If your application
bottleneck is the availability of RAM for strings, 3.3 will potentially be
faster since it can use anything up to 1/4 of the memory for strings. If
your application doesn't use much memory, or if it uses lots of strings
which get created then thrown away.


> So far the superiority of 3.3 escapes me,

Yeah I know, I resisted migrating from 1.5 to 2.x for years. When I finally
migrated to 2.3, at first I couldn't see any benefit either. New style
classes? Super? Properties? Unified ints and longs? Big deal. Especially
since I was still writing 1.5 compatible code and couldn't really take
advantage of the new features.

When I eventually gave up on supporting versions pre-2.3, it was a load off
my shoulders. Now I can't wait to stop supporting 2.4 and 2.5, which will
make things even easier. And when I can ignore everything below 3.3 will be
a truly happy day.


> but I'm tasked with enjoying
> this process so I'm sure there must be some new 'feature' that will help.
> Perhaps 'yield from' or 'raise from None' or .......

No, you have this completely backwards. New features don't help you support
old versions of Python that lack those new features. New features are an
incentive to drop support for old versions.


> In any case I think we will be maintaining python 2.x code for at least
> another 5 years; the version gap is then a real hindrance.

Five years sounds about right.



--
Steven

Terry Reedy

unread,
Jan 3, 2014, 4:01:18 AM1/3/14
to pytho...@python.org
On 1/2/2014 11:49 PM, Steven D'Aprano wrote:
> Robin Becker wrote:
>
>> For fairly sensible reasons we changed the internal default to use unicode
>> rather than bytes. After doing all that and making the tests compatible
>> etc etc I have a version which runs in both and passes all its tests.
>> However, for whatever reason the python 3.3 version runs slower
>
> "For whatever reason" is right, unfortunately there's no real way to tell
> from the limited information you give what that might be.
>
> Are you comparing a 2.7 "wide" or "narrow" build? Do your tests use any
> so-called "astral characters" (characters in the Supplementary Multilingual
> Planes, i.e. characters with ord() > 0xFFFF)?
>
> If I remember correctly, some early alpha(?) versions of Python 3.3
> consistently ran Unicode operations a small but measurable amount slower
> than 3.2 or 2.7. That especially effected Windows. But I understand that
> this was sped up in the release version of 3.3.

There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
should run the latter.

--
Terry Jan Reedy

wxjm...@gmail.com

unread,
Jan 3, 2014, 5:10:36 AM1/3/14
to
It's time to understand the Character Encoding Models
and the math behind it.
Unicode does not differ from any other coding scheme.

How? With a sheet of paper and a pencil.

jmf

Chris Angelico

unread,
Jan 3, 2014, 5:24:43 AM1/3/14
to pytho...@python.org
One plus one is two, therefore Python is better than Haskell.

Four times five is twelve, and four times six is thirteen, and four
times seven is enough to make Alice think she's Mabel, and London is
the capital of Paris, and the crocodile cheerfully grins. Therefore,
by obvious analogy, Unicode times new-style classes equals a 64-bit
process.

I worked that out with a sheet of paper and a pencil. The pencil was a
little help, but the paper was three sheets in the wind.

ChrisA

Robin Becker

unread,
Jan 3, 2014, 5:32:26 AM1/3/14
to pytho...@python.org
On 02/01/2014 18:25, David Hutto wrote:
> Just because it's 3.3 doesn't matter...the main interest is in
> compatibility. Secondly, you used just one piece of code, which could be a
> fluke, try others, and check the PEP. You need to realize that evebn the
> older versions are benig worked on, and they have to be refined. So if you
> have a problem, use the older and import from the future would be my
> suggestion

Suggesting that I use another piece of code to test python3 against python2 is a
bit silly. I'm sure I can find stuff which runs faster under python3, but
reportlab is the code I'm porting and that is going the wrong way.
--
Robin Becker

Robin Becker

unread,
Jan 3, 2014, 6:14:41 AM1/3/14
to pytho...@python.org
On 02/01/2014 18:37, Terry Reedy wrote:
> On 1/2/2014 12:36 PM, Robin Becker wrote:
>
>> I just spent a large amount of effort porting reportlab to a version
>> which works with both python2.7 and python3.3. I have a large number of
>> functions etc which handle the conversions that differ between the two
>> pythons.
>
> I am imagine that this was not fun.

indeed :)
>
>> For fairly sensible reasons we changed the internal default to use
>> unicode rather than bytes.
>
> Do you mean 'from __future__ import unicode_literals'?

No, previously we had default of utf8 encoded strings in the lower levels of the
code and we accepted either unicode or utf8 string literals as inputs to text
functions. As part of the port process we made the decision to change from
default utf8 str (bytes) to default unicode.

> Am I correct in thinking that this change increases the capabilities of
> reportlab? For instance, easily producing an article with abstracts in English,
> Arabic, Russian, and Chinese?
>
It's made no real difference to what we are able to produce or accept since utf8
or unicode can encode anything in the input and what can be produced depends on
fonts mainly.

> > After doing all that and making the tests
...........
>> I know some of these tests are fairly variable, but even for simple
>> things like paragraph parsing 3.3 seems to be slower. Since both use
>> unicode internally it can't be that can it, or is python 2.7's unicode
>> faster?
>
> The new unicode implementation in 3.3 is faster for some operations and slower
> for others. It is definitely more space efficient, especially compared to a wide
> build system. It is definitely less buggy, especially compared to a narrow build
> system.
>
> Do your tests use any astral (non-BMP) chars? If so, do they pass on narrow 2.7
> builds (like on Windows)?

I'm not sure if we have any non-bmp characters in the tests. Simple CJK etc etc
for the most part. I'm fairly certain we don't have any ability to handle
composed glyphs (multi-codepoint) etc etc



....
> For one thing, indexing and slicing just works on all machines for all unicode
> strings. Code for 2.7 and 3.3 either a) does not index or slice, b) does not
> work for all text on 2.7 narrow builds, or c) has extra conditional code only
> for 2.7.
>

probably
--
Robin Becker

Robin Becker

unread,
Jan 3, 2014, 6:37:42 AM1/3/14
to pytho...@python.org
On 02/01/2014 23:57, Antoine Pitrou wrote:
>
..........
>
> Running a test suite is a completely broken benchmarking methodology.
> You should isolate workloads you are interested in and write a benchmark
> simulating them.
>

I'm certain you're right, but individual bits of code like generating our
reference manual also appear to be slower in 3.3.

> Regards
>
> Antoine.
>
>


--
Robin Becker

Robin Becker

unread,
Jan 3, 2014, 7:28:48 AM1/3/14
to pytho...@python.org
On 03/01/2014 09:01, Terry Reedy wrote:
> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
> should run the latter.

python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc that's
almost irrelevant for the sort of tests we're doing which are mostly simple
english text.
--
Robin Becker

Roy Smith

unread,
Jan 3, 2014, 9:57:48 AM1/3/14
to
In article <mailman.4850.1388752...@python.org>,
The sad part is, if you're accepting any text from external sources, you
need to be able to deal with astral.

I was doing a project a while ago importing 20-something million records
into a MySQL database. Little did I know that FOUR of those records
contained astral characters (which MySQL, at least the version I was
using, couldn't handle).

My way of dealing with those records was to nuke them. Longer term we
ended up switching to Postgress.

Chris Angelico

unread,
Jan 3, 2014, 10:32:14 AM1/3/14
to pytho...@python.org
On Sat, Jan 4, 2014 at 1:57 AM, Roy Smith <r...@panix.com> wrote:
> I was doing a project a while ago importing 20-something million records
> into a MySQL database. Little did I know that FOUR of those records
> contained astral characters (which MySQL, at least the version I was
> using, couldn't handle).
>
> My way of dealing with those records was to nuke them. Longer term we
> ended up switching to Postgress.

Look! Postgres means you don't lose data!!

Seriously though, that's a much better long-term solution than
destroying data. But MySQL does support the full Unicode range - just
not in its "UTF8" type. You have to specify "UTF8MB4" - that is,
"maximum bytes 4" rather than the default of 3. According to [1], the
UTF8MB4 encoding is stored as UTF-16, and UTF8 is stored as UCS-2. And
according to [2], it's even possible to explicitly choose the
mindblowing behaviour of UCS-2 for a data type that calls itself
"UTF8", so that a vague theoretical subsequent version of MySQL might
be able to make "UTF8" mean UTF-8, and people can choose to use the
other alias.

To my mind, this is a bug with backward-compatibility concerns. That
means it can't be fixed in a point release. Fine. But the behaviour
change is "this used to throw an error, now it works". Surely that can
be fixed in the next release. Or surely a version or two of
deprecating "UTF8" in favour of the two "MB?" types (and never ever
returning "UTF8" from any query), followed by a reintroduction of
"UTF8" as an alias for MB4, and the deprecation of MB3. Or am I
spoiled by the quality of Python (and other) version numbering, where
I can (largely) depend on functionality not changing in point
releases?

ChrisA

[1] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html
[2] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb3.html

Ethan Furman

unread,
Jan 3, 2014, 11:56:39 AM1/3/14
to pytho...@python.org
On 01/03/2014 02:24 AM, Chris Angelico wrote:
>
> I worked that out with a sheet of paper and a pencil. The pencil was a
> little help, but the paper was three sheets in the wind.

Beautiful!

--
~Ethan~

Terry Reedy

unread,
Jan 3, 2014, 5:00:32 PM1/3/14
to pytho...@python.org
On 1/3/2014 7:28 AM, Robin Becker wrote:
> On 03/01/2014 09:01, Terry Reedy wrote:
>> There was more speedup in 3.3.2 and possibly even more in 3.3.3, so OP
>> should run the latter.
>
> python 3.3.3 is what I use on windows. As for astral / non-bmp etc etc
> that's almost irrelevant for the sort of tests we're doing which are
> mostly simple english text.

If you do not test the cases where 2.7 is buggy and requires nasty
workarounds, then I can understand why you do not so much appreciate 3.3
;-).

--
Terry Jan Reedy

Mark Lawrence

unread,
Jan 3, 2014, 11:04:39 PM1/3/14
to pytho...@python.org
Are you crazy? Surely everybody prefers fast but incorrect code in
preference to something that is correct but slow? Except that Python
3.3.3 is often faster. And always (to my knowledge) correct. Upper
Class Twit of the Year anybody? :)

Mark Lawrence

unread,
Jan 4, 2014, 2:30:55 AM1/4/14
to pytho...@python.org
On 02/01/2014 17:36, Robin Becker wrote:
> On 31/12/2013 15:41, Roy Smith wrote:
>> I'm using 2.7 in production. I realize that at some point we'll need to
>> upgrade to 3.x. We'll keep putting that off as long as the "effort +
>> dependencies + risk" metric exceeds the "perceived added value" metric.
>>
> We too are using python 2.4 - 2.7 in production. Different clients
> migrate at different speeds.
>
>>
>> To be honest, the "perceived added value" in 3.x is pretty low for us.
>> What we're running now works. Switching to 3.x isn't going to increase
>> our monthly average users, or our retention rate, or decrease our COGS,
>> or increase our revenue. There's no killer features we need. In
>> summary, the decision to migrate will be driven more by risk aversion,
>> when the risk of staying on an obsolete, unsupported platform, exceeds
>> the risk of moving to a new one. Or, there will be some third-party
>> module that we must have which is no longer supported on 2.x.
>>
>
> +1
>
>> If I were starting a new project today, I would probably start it in 3.x.
> +1
>
> I just spent a large amount of effort porting reportlab to a version
> which works with both python2.7 and python3.3. I have a large number of
> functions etc which handle the conversions that differ between the two
> pythons.
>
> For fairly sensible reasons we changed the internal default to use
> unicode rather than bytes. After doing all that and making the tests
> compatible etc etc I have a version which runs in both and passes all
> its tests. However, for whatever reason the python 3.3 version runs slower
>
> 2.7 Ran 223 tests in 66.578s
>
> 3.3 Ran 223 tests in 75.703s
>
> I know some of these tests are fairly variable, but even for simple
> things like paragraph parsing 3.3 seems to be slower. Since both use
> unicode internally it can't be that can it, or is python 2.7's unicode
> faster?
>
> So far the superiority of 3.3 escapes me, but I'm tasked with enjoying
> this process so I'm sure there must be some new 'feature' that will
> help. Perhaps 'yield from' or 'raise from None' or .......
>
> In any case I think we will be maintaining python 2.x code for at least
> another 5 years; the version gap is then a real hindrance.

Of interest
https://mail.python.org/pipermail/python-dev/2012-October/121919.html ?

wxjm...@gmail.com

unread,
Jan 4, 2014, 8:52:20 AM1/4/14
to
----

To Robin Becker

I know nothing about ReportLab except its existence.
Your story is very interesting. As I pointed, I know
nothing about the internal of ReportLab, the technical
aspects: the "Python part", "the used api for the PDF creation").
I have however some experience with the unicode TeX engine,
XeTeX, understand I'm understanding a little bit what's
happening behind the scene.

The very interesting aspect in the way you are holding
unicodes (strings). By comparing Python 2 with Python 3.3,
you are comparing utf-8 with the the internal "representation"
of Python 3.3 (the flexible string represenation).
In one sense, more than comparing Py2 with Py3.

It will be much more interesting to compare utf-8/Python
internals at the light of Python 3.2 and Python 3.3. Python
3.2 has a decent unicode handling, Python 3.3 has an absurd
(in mathematical sense) unicode handling. This is really
shining with utf-8, where this flexible string representation
is just doing the opposite of what a correct unicode
implementation does!

On the memory side, it is obvious to see it.

>>> sys.getsizeof('a'*10000 + 'z')
10026
>>> sys.getsizeof('a'*10000 + '€')
20040
>>> sys.getsizeof(('a'*10000 + 'z').encode('utf-8'))
10018
>>> sys.getsizeof(('a'*10000 + '€').encode('utf-8'))
10020

On the performance side, it is much more complexe,
but qualitatively, you may expect the same results.


The funny aspect is that by working with utf-8 in that
case, you are (or one has) forcing Python to work
properly, but one pays on the side of the performance.
And if one wishes to save memory, one has to pay on the
side of performance.

In othe words, attempting to do what Python is
not able to do natively is just impossible!


I'm skipping the very interesting composed glyphs subject
(unicode normalization, ...), but I wish to point that
with the flexible string representation, one reaches
the top level of surrealism. For a tool which is supposed
to handle these very specific unicode tasks...

jmf

Roy Smith

unread,
Jan 4, 2014, 8:55:11 AM1/4/14
to
In article <mailman.4882.1388808...@python.org>,
Mark Lawrence <bream...@yahoo.co.uk> wrote:

> Surely everybody prefers fast but incorrect code in
> preference to something that is correct but slow?

I realize I'm taking this statement out of context, but yes, sometimes
fast is more important than correct. Sometimes the other way around.

Chris Angelico

unread,
Jan 4, 2014, 9:17:40 AM1/4/14
to pytho...@python.org
More usually, it's sometimes better to be really fast and mostly
correct than really really slow and entirely correct. That's why we
use IEEE floating point instead of Decimal most of the time. Though
I'm glad that Python 3 now deems the default int type to be capable of
representing arbitrary integers (instead of dropping out to a separate
long type as Py2 did), I think it's possibly worth optimizing small
integers to machine words - but mainly, the int type focuses on
correctness above performance, because the cost is low compared to the
benefit. With float, the cost of arbitrary precision is extremely
high, and the benefit much lower.

With Unicode, the cost of perfect support is normally seen to be a
doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python
decided that the cost could, instead, be a tiny measure of complexity
and actually *less* memory usage (compared to UTF-16, when lots of
identifiers are ASCII). It's a system that works only when strings are
immutable, but works beautifully there. Fortunately Pike doesn't have
any, and Python has only one, idiot like jmf who completely
misunderstands what's going on and uses microbenchmarks to prove
obscure points... and then uses nonsense to try to prove... uhh...
actually I'm not even sure what, sometimes. I wouldn't dare try to
read his posts except that my mind's already in a rather broken state,
as a combination of programming and Alice in Wonderland.

ChrisA

Ned Batchelder

unread,
Jan 4, 2014, 12:51:32 PM1/4/14
to pytho...@python.org
On 1/4/14 9:17 AM, Chris Angelico wrote:
> On Sun, Jan 5, 2014 at 12:55 AM, Roy Smith <r...@panix.com> wrote:
> More usually, it's sometimes better to be really fast and mostly
> correct than really really slow and entirely correct. That's why we
> use IEEE floating point instead of Decimal most of the time. Though
> I'm glad that Python 3 now deems the default int type to be capable of
> representing arbitrary integers (instead of dropping out to a separate
> long type as Py2 did), I think it's possibly worth optimizing small
> integers to machine words - but mainly, the int type focuses on
> correctness above performance, because the cost is low compared to the
> benefit. With float, the cost of arbitrary precision is extremely
> high, and the benefit much lower.
>
> With Unicode, the cost of perfect support is normally seen to be a
> doubling of internal memory usage (UTF-16 vs UCS-4). Pike and Python
> decided that the cost could, instead, be a tiny measure of complexity
> and actually *less* memory usage (compared to UTF-16, when lots of
> identifiers are ASCII). It's a system that works only when strings are
> immutable, but works beautifully there. Fortunately Pike doesn't have
> any, and Python has only one, idiot like jmf who completely
> misunderstands what's going on and uses microbenchmarks to prove
> obscure points... and then uses nonsense to try to prove... uhh...
> actually I'm not even sure what, sometimes. I wouldn't dare try to
> read his posts except that my mind's already in a rather broken state,
> as a combination of programming and Alice in Wonderland.
>
> ChrisA
>

I really wish we could discuss these things without baiting trolls.

--
Ned Batchelder, http://nedbatchelder.com

wxjm...@gmail.com

unread,
Jan 4, 2014, 2:10:03 PM1/4/14
to
I do not mind to be considered as an idiot, but
I'm definitively not blind.

And I could add, I *never* saw once one soul, who is
explaining what I'm doing wrong in the gazillion
of examples I gave on this list.

---

Back to ReportLab. Technically I would be really
interested to see what could happen at the light
of my previous post.

jmf

Terry Reedy

unread,
Jan 4, 2014, 5:46:49 PM1/4/14
to pytho...@python.org
On 1/4/2014 2:10 PM, wxjm...@gmail.com wrote:
> Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit :

>> any, and Python has only one, idiot like jmf who completely

Chris, I appreciate the many contributions you make to this list, but
that does not exempt you from out standard of conduct.

>> misunderstands what's going on and uses microbenchmarks to prove
>> obscure points... and then uses nonsense to try to prove... uhh...

Troll baiting is a form of trolling. I think you are intelligent enough
to know this. Please stop.

> I do not mind to be considered as an idiot, but
> I'm definitively not blind.
>
> And I could add, I *never* saw once one soul, who is
> explaining what I'm doing wrong in the gazillion
> of examples I gave on this list.

If this is true, it is because you have ignored and not read my
numerous, relatively polite posts. To repeat very briefly:

1. Cherry picking (presenting the most extreme case as representative).

2. Calling space saving a problem (repeatedly).

3. Ignoring bug fixes.

4. Repetition (of the 'gazillion example' without new content).

Have you ever acknowledged, let alone thank people for, the fix for the
one bad regression you did find. The FSR is still a work in progress.
Just today, Serhiy pushed a patch speeding up the UTF-32 encoder, after
previously speeding up the UTF-32 decoder.

--
Terry Jan Reedy


Chris Angelico

unread,
Jan 4, 2014, 6:28:37 PM1/4/14
to pytho...@python.org
On Sun, Jan 5, 2014 at 9:46 AM, Terry Reedy <tjr...@udel.edu> wrote:
> On 1/4/2014 2:10 PM, wxjm...@gmail.com wrote:
>>
>> Le samedi 4 janvier 2014 15:17:40 UTC+1, Chris Angelico a écrit :
>
>
>>> any, and Python has only one, idiot like jmf who completely
>
>
> Chris, I appreciate the many contributions you make to this list, but that
> does not exempt you from out standard of conduct.
>
>
>>> misunderstands what's going on and uses microbenchmarks to prove
>>> obscure points... and then uses nonsense to try to prove... uhh...
>
>
> Troll baiting is a form of trolling. I think you are intelligent enough to
> know this. Please stop.

My apologies. I withdraw the aforequoted post. You and Ned are
correct, those comments were inappropriate. Sorry.

ChrisA

Steven D'Aprano

unread,
Jan 4, 2014, 9:27:13 PM1/4/14
to
I know somebody who was once touring in the States, and ended up travelling
cross-country by road with the roadies rather than flying. She tells me of
the time someone pointed out that they were travelling in the wrong
direction, away from their destination. The roadie driving replied "Who
cares? We're making fantastic time!"

(Ah, the seventies. So many drugs...)

Fast is never more important than correct. It's just that sometimes you
might compromise a little (or a lot) on what counts as correct in order for
some speed.

To give an example, say you want to solve the Travelling Salesman Problem,
and find the shortest path through a whole lot of cities A, B, C, ..., Z.
That's a Hard Problem, expensive to solve correctly.

But if you loosen the requirements so that a correct solution no longer has
to be the absolutely shortest path, and instead accept solutions which are
nearly always close to the shortest (but without any guarantee of how
close), then you can make the problem considerably easier to solve.

But regardless of how fast your path-finder algorithm might become, you're
unlikely to be satisfied with a solution that travels around in a circle
from A to B a million times then shoots off straight to Z without passing
through any of the other cities.



--
Steven

Chris Angelico

unread,
Jan 4, 2014, 9:32:44 PM1/4/14
to pytho...@python.org
On Sun, Jan 5, 2014 at 1:27 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> But regardless of how fast your path-finder algorithm might become, you're
> unlikely to be satisfied with a solution that travels around in a circle
> from A to B a million times then shoots off straight to Z without passing
> through any of the other cities.

On the flip side, that might be the best salesman your company has
ever known, if those three cities have the most customers!

ChrisA
wondering why nobody cares about the customers in TSP discussions

Steven D'Aprano

unread,
Jan 4, 2014, 9:41:20 PM1/4/14
to
wxjm...@gmail.com wrote:

> The very interesting aspect in the way you are holding
> unicodes (strings). By comparing Python 2 with Python 3.3,
> you are comparing utf-8 with the the internal "representation"
> of Python 3.3 (the flexible string represenation).

This is incorrect. Python 2 has never used UTF-8 internally for Unicode
strings. In narrow builds, it uses UTF-16, but makes no allowance for
surrogate pairs in strings. In wide builds, it uses UTF-32.

Other implementations, such as Jython or IronPython, may do something else.


--
Steven

MRAB

unread,
Jan 4, 2014, 9:41:05 PM1/4/14
to pytho...@python.org, pytho...@python.org
On 2014-01-05 02:32, Chris Angelico wrote:
> On Sun, Jan 5, 2014 at 1:27 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> But regardless of how fast your path-finder algorithm might become, you're
>> unlikely to be satisfied with a solution that travels around in a circle
>> from A to B a million times then shoots off straight to Z without passing
>> through any of the other cities.
>
> On the flip side, that might be the best salesman your company has
> ever known, if those three cities have the most customers!
>
> ChrisA
> wondering why nobody cares about the customers in TSP discussions
>
Or, for that matter, ISP customers who don't live in an urban area. :-)

Chris Angelico

unread,
Jan 4, 2014, 9:54:29 PM1/4/14
to pytho...@python.org
On Sun, Jan 5, 2014 at 1:41 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> wxjm...@gmail.com wrote:
>
>> The very interesting aspect in the way you are holding
>> unicodes (strings). By comparing Python 2 with Python 3.3,
>> you are comparing utf-8 with the the internal "representation"
>> of Python 3.3 (the flexible string represenation).
>
> This is incorrect. Python 2 has never used UTF-8 internally for Unicode
> strings. In narrow builds, it uses UTF-16, but makes no allowance for
> surrogate pairs in strings. In wide builds, it uses UTF-32.

That's for Python's unicode type. What Robin said was that they were
using either a byte string ("str") with UTF-8 data, or a Unicode
string ("unicode") with character data. So jmf was right, except that
it's not specifically to do with Py2 vs Py3.3.

ChrisA

Roy Smith

unread,
Jan 4, 2014, 10:20:40 PM1/4/14
to
I wrote:
> > I realize I'm taking this statement out of context, but yes, sometimes
> > fast is more important than correct.

In article <52c8c301$0$29998$c3e8da3$5496...@news.astraweb.com>,
Steven D'Aprano <steve+comp....@pearwood.info> wrote:
> Fast is never more important than correct.

Sure it is.

Let's imagine you're building a system which sorts packages for
delivery. You sort 1 million packages every night and put them on
trucks going out for final delivery.

Some assumptions:

Every second I can cut from the sort time saves me $0.01.

If I mis-sort a package, it goes out on the wrong truck, doesn't get
discovered until the end of the day, and ends up costing me $5
(including not just the direct cost of redelivering it, but also
factoring in ill will and having to make the occasional refund for not
meeting the promised delivery time).

I've got a new sorting algorithm which is guaranteed to cut 10 seconds
off the sorting time (i.e. $0.10 per package). The problem is, it makes
a mistake 1% of the time.

Let's see:

1 million packages x $0.10 = $100,000 saved per day because I sort them
faster. 10,000 of them will go to the wrong place, and that will cost
me $50,000 per day. By going fast and making mistakes once in a while,
I increase my profit by $50,000 per day.

The numbers above are fabricated, but I'm sure UPS, FexEx, and all the
other package delivery companies are doing these sorts of analyses every
day. I watch the UPS guy come to my house. He gets out of his truck,
walks to my front door, rings the bell, waits approximately 5
microseconds, leaves the package on the porch, and goes back to his
truck. I'm sure UPS has figured out that the amortized cost of the
occasional stolen or lost package is less than the cost for the delivery
guy to wait for me to come to the front door and sign for the delivery.

Looking at another problem domain, let's say you're a contestant on
Jeopardy. If you listen to the entire clue and spend 3 seconds making
sure you know the correct answer before hitting the buzzer, it doesn't
matter if you're right or wrong. Somebody else beat you to the buzzer,
2.5 seconds ago.

Or, let's take an example from sports. I'm standing at home plate
holding a bat. 60 feet away from me, the pitcher is about to throw a
baseball towards me at darn close to 100 MPH (insert words like "bowl"
and "wicket" as geographically appropriate). 400 ms later, the ball is
going to be in the catcher's glove if you don't hit it. If you have an
absolutely perfect algorithm to determining if it's a ball or a strike,
which takes 500 ms to run, you're going back to the minor leagues. If
you have a 300 ms algorithm which is right 75% of the time, you're
heading to the hall of fame.

Rustom Mody

unread,
Jan 4, 2014, 11:42:47 PM1/4/14
to Roy Smith, pytho...@python.org
Neat examples -- thanks
Only minor quibble isnt $5 cost of mis-sorting a gross underestimate?

I am reminded of a passage of Dijkstra in Discipline of Programming --
something to this effect

He laments the fact that hardware engineers were not including
overflow checks in machine ALUs.
He explained as follows:
If a test is moderately balanced (statistically speaking) a programmer
will not mind writing an if statement

If however the test is very skew -- say if 99% times, else 1% -- he
will tend to skimp on the test, producing 'buggy' code [EWD would
never use the bad b word or course]

The cost equation for hardware is very different -- once the
investment in the silicon is done with -- fixed cost albeit high --
there is no variable cost to executing that circuitry once or a
zillion times

Moral of Story: Intel should take up FSR
[Ducks and runs for cover]

Roy Smith

unread,
Jan 5, 2014, 12:11:13 AM1/5/14
to
In article <mailman.4929.1388896...@python.org>,
I have no idea. Like I said, the numbers are all fabricated.

I do have a friend who used to work for UPS. He told me lots of UPS
efficiency stories. One of them had to do with mis-routed packages.
IIRC, the process for dealing with a mis-routed package was to NOT waste
any time trying to figure out why it was mis-routed. It was just thrown
back into the input hopper to go through the whole system again. The
sorting software kept track of how many times it had sorted a particular
package. Only after N attempts (where N was something like 3), was it
kicked out of the automated process for human intervention.

Steven D'Aprano

unread,
Jan 5, 2014, 1:28:14 AM1/5/14
to
Roy Smith wrote:

> I wrote:
>> > I realize I'm taking this statement out of context, but yes, sometimes
>> > fast is more important than correct.
>
> In article <52c8c301$0$29998$c3e8da3$5496...@news.astraweb.com>,
> Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>> Fast is never more important than correct.
>
> Sure it is.

Sure it isn't. I think you stopped reading my post too early.

None of your examples contradict what I am saying. They all involve exactly
the same sort of compromise regarding "correctness" that I'm talking about,
where you loosen what counts as "correct" for the purpose of getting extra
speed. So, for example:

> Let's imagine you're building a system which sorts packages for
> delivery. You sort 1 million packages every night and put them on
> trucks going out for final delivery.

What's your requirement, i.e. what counts as "correct" for the delivery
algorithm being used? Is it that every parcel is delivered to the specified
delivery address the first time? No it is not. What counts as "correct" for
the delivery algorithm is something on the lines of "No less than 95% of
parcels will be sorted correctly and delivered directly; no more than 5%
may be mis-sorted at most three times" (or some similar requirement).

It may even been that the requirements are even looser, e.g.:

"No more than 1% of parcels will be lost/damaged/stolen/destroyed"

in which case they don't care unless a particular driver loses or destroys
more than 1% of his deliveries. But if it turns out that Fred is dumping
every single one of his parcels straight into the river, the fact that he
can make thirty deliveries in the time it takes other drivers to make one
will not save his job. "But it's much faster to dump the parcels in the
river" does not matter. What matters is that the deliveries are made within
the bounds of allowable time and loss.

Things get interesting when the people setting the requirements and the
people responsible for meeting those requirements aren't able to agree.
Then you have customers who complain that the software is buggy, and
developers who complain that the customer requirements are impossible to
provide. Sometimes they're both right.


> Looking at another problem domain, let's say you're a contestant on
> Jeopardy. If you listen to the entire clue and spend 3 seconds making
> sure you know the correct answer before hitting the buzzer, it doesn't
> matter if you're right or wrong. Somebody else beat you to the buzzer,
> 2.5 seconds ago.

I've heard of Jeopardy, but never seen it. But I know about game shows, and
in this case, what you care about is *winning the game*, not answering the
questions correctly. Answering the questions correctly is only a means to
the end, which is "Win". If the rules allow it, your best strategy might
even be to give wrong answers, every time!

(It's not quite a game show, but the British quiz show QI is almost like
that. The rules, if there are any, encourage *interesting* answers over
correct answers. Occasionally that leads to panelists telling what can best
be described as utter porkies[1].)

If Jeopardy does not penalise wrong answers, the "best" strategy might be to
jump in with an answer as quickly as possible, without caring too much
about whether it is the right answer. But if Jeopardy penalises mistakes,
then the "best" strategy might be to take as much time as you can to answer
the question, and hope for others to make mistakes. That's often the
strategy in Test cricket: play defensively, and wait for the opposition to
make a mistake.


> Or, let's take an example from sports. I'm standing at home plate
> holding a bat. 60 feet away from me, the pitcher is about to throw a
> baseball towards me at darn close to 100 MPH (insert words like "bowl"
> and "wicket" as geographically appropriate). 400 ms later, the ball is
> going to be in the catcher's glove if you don't hit it. If you have an
> absolutely perfect algorithm to determining if it's a ball or a strike,
> which takes 500 ms to run, you're going back to the minor leagues. If
> you have a 300 ms algorithm which is right 75% of the time, you're
> heading to the hall of fame.

And if you catch the ball, stick it in your pocket and race through all the
bases, what's that? It's almost certainly faster than trying to play by the
rules. If speed is all that matters, that's what people would do. But it
isn't -- the "correct" strategy depends on many different factors, one of
which is that you have a de facto time limit on deciding whether to swing
or let the ball through.

Your baseball example is no different from the example I gave before. "Find
the optimal path for the Travelling Salesman Problem in a week's time",
versus "Find a close to optimal path in three minutes" is conceptually the
same problem, with the same solution: an imperfect answer *now* can be
better than a perfect answer *later*.




[1] Porkies, or "pork pies", from Cockney rhyming slang.

--
Steven

Chris Angelico

unread,
Jan 4, 2014, 11:01:22 PM1/4/14
to pytho...@python.org
On Sun, Jan 5, 2014 at 2:20 PM, Roy Smith <r...@panix.com> wrote:
> I've got a new sorting algorithm which is guaranteed to cut 10 seconds
> off the sorting time (i.e. $0.10 per package). The problem is, it makes
> a mistake 1% of the time.

That's a valid line of argument in big business, these days, because
we've been conditioned to accept low quality. But there are places
where quality trumps all, and we're happy to pay for that. Allow me to
expound two examples.

1) Amazon

http://www.amazon.com/exec/obidos/ASIN/1782010165/evertype-20

I bought this book a while ago. It's about the size of a typical
paperback. It arrived in a box too large for it on every dimension,
with absolutely no packaging. I complained. Clearly their algorithm
was: "Most stuff will get there in good enough shape, so people can't
be bothered complaining. And when they do complain, it's cheaper to
ship them another for free than to debate with them on chat." Because
that's what they did. Fortunately I bought the book for myself, not
for a gift, because the *replacement* arrived in another box of the
same size, with ... one little sausage for protection. That saved it
in one dimension out of three, so it arrived only slightly
used-looking instead of very used-looking. And this a brand new book.
When I complained the second time, I was basically told "any
replacement we ship you will be exactly the same". Thanks.

2) Bad Monkey Productions

http://kck.st/1bgG8Pl

The cheapest the book itself will be is $60, and the limited edition
early ones are more (I'm getting the gold level book, $200 for one of
the first 25 books, with special sauce). The people producing this are
absolutely committed to quality, as are the nearly 800 backers. If
this project is delayed slightly in order to ensure that we get
something fully awesome, I don't think there will be complaints. This
promises to be a beautiful book that'll be treasured for generations,
so quality's far FAR more important than the exact delivery date.

I don't think we'll ever see type #2 become universal, for the same
reason that people buy cheap Chinese imports in the supermarket rather
than something that costs five times as much from a specialist. The
expensive one might be better, but why bother? When the cheap one
breaks, you just get another. The expensive one might fail too, so why
take that risk?

But it's always a tradeoff, and there'll always be a few companies
around who offer the more expensive product. (We have a really high
quality cheese slicer. It's still the best I've seen, after something
like 20 years of usage.) Fast or right? It'd have to be really
*really* fast to justify not being right, unless the lack of rightness
is less than measurable (like representing time in nanoseconds -
anything smaller than that is unlikely to be measurable on most
computers).

ChrisA

Devin Jeanpierre

unread,
Jan 5, 2014, 3:00:24 AM1/5/14
to Steven D'Aprano, comp.lang.python
On Sat, Jan 4, 2014 at 6:27 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> Fast is never more important than correct. It's just that sometimes you
> might compromise a little (or a lot) on what counts as correct in order for
> some speed.

Is this statement even falsifiable? Can you conceive of a circumstance
where someone has traded correctness for speed, but where one couldn't
describe it that latter way? I can't. I think by definition you can
always describe it that way, you just make "what counts as
correctness" be "what the customer wants given the resources
available". The conventional definition, however, is "what the
customer wants, imagining that you have infinite resources". With just
a little redefinition that seems reasonable, you can be made never to
be wrong!


I avoid making unfalsifiable arguments that aren't explicitly labeled
as such. I try to reword them as, "I prefer to look at it as ..." --
it's less aggressive, which means people are more likely to really
listen to what you have to say. It also doesn't pretend to be an
argument when it isn't.

-- Devin

wxjm...@gmail.com

unread,
Jan 5, 2014, 5:39:52 AM1/5/14
to
Yes, the key point is the preparation of the "unicode
text" for the PDF producer.

This is at this level the different flavours of Python
may be relevant.

I see four possibilites, I do not know what
the PDF producer API is expecting.

- Py2 with utf-8 byte string (ev. utf-16, utf-32)
- Py2 with its internal unicode
- Py3.2 with its internal unicode
- Py3.3 with its internal unicode

jmf

Johannes Bauer

unread,
Jan 5, 2014, 7:14:36 AM1/5/14
to
On 31.12.2013 10:53, Steven D'Aprano wrote:
> Mark Lawrence wrote:
>
>> http://blog.startifact.com/posts/alex-gaynor-on-python-3.html.
>
> I quote:
>
> "...perhaps a brave group of volunteers will stand up and fork Python 2, and
> take the incremental steps forward. This will have to remain just an idle
> suggestion, as I'm not volunteering myself."
>
> I expect that as excuses for not migrating get fewer, and the deadline for
> Python 2.7 end-of-life starts to loom closer, more and more haters^W
> Concerned People will whine about the lack of version 2.8 and ask for
> *somebody else* to fork Python.
>
> I find it, hmmm, interesting, that so many of these Concerned People who say
> that they're worried about splitting the Python community[1] end up
> suggesting that we *split the community* into those who have moved forward
> to Python 3 and those who won't.

Exactly. I don't know what exactly their problem is. I've pushed the
migration of *large* projects at work to Python3 when support was pretty
early and it really wasn't a huge deal.

Specifically because I love pretty much every single aspect that Python3
introduced. The codec support is so good that I've never seen anything
like it in any other programming language and then there's the tons of
beautiful changes (div/intdiv, functools.lru_cache, print(),
datetime.timedelta.total_seconds(), int.bit_length(), bytes/bytearray).

Regards,
Joe

--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht �ffentlich!
Ah, der neueste und bis heute genialste Streich unsere gro�en
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos �ber R�diger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org>

Steven D'Aprano

unread,
Jan 5, 2014, 7:28:27 AM1/5/14
to
Devin Jeanpierre wrote:

> On Sat, Jan 4, 2014 at 6:27 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> Fast is never more important than correct. It's just that sometimes you
>> might compromise a little (or a lot) on what counts as correct in order
>> for some speed.
>
> Is this statement even falsifiable? Can you conceive of a circumstance
> where someone has traded correctness for speed, but where one couldn't
> describe it that latter way? I can't.

Every time some programmer "optimises" a piece of code (or, more often,
*thinks* they have optimised it) which introduces bugs into the software,
that's a case where somebody has traded correctness for speed where my
statement doesn't apply. Sometimes the response to the subsequent bug
report is "will not fix", and a retroactive change in the software
requirements. ("Oh, did we say that indexing a string would return a
character? We meant it would return a character, so long as the string only
includes no Unicode characters in the astral planes.") Sometimes it is to
revert the optimisation or otherwise fix the bug.

I accept that there is sometimes a fine line here. I'm assuming that
software applications have their requirements fully documented, which in
the real world is hardly ever the case. Although, even if the requirements
aren't always written down, often they are implicitly understood. (Although
it gets very interesting when the users' understanding and the developers'
understanding is different.)

Take as an example this "torture test" for a mathematical sum function,
where the built-in sum() gets the wrong answer but math.fsum() gets it
right:

py> from math import fsum
py> values = [1e12, 0.0001, -1e12, 0.0001]*10000
py> fsum(values)
2.0
py> sum(values)
2.4413841796875


Here's another example of the same thing, just to prove it's not a fluke:

py> values = [1e17, 1, 1, -1e17]
py> fsum(values)
2.0
py> sum(values)
0.0


The reason for the different results is that fsum() tries hard to account
for intermediate rounding errors and sum() does not. If you benchmark the
two functions, you'll find that sum() is significantly faster than fsum. So
the question to be asked is, does sum() promise to calculate floating point
sums accurately? If so, then this is a bug, probably introduced by the
desire for speed. But in fact, sum() does not promise to calculate floating
point sums accurately. What it promises to do is to calculate the
equivalent of a + b + c + ... for as many values as given, and that's
exactly what it does. Conveniently, that's faster than fsum(), and usually
accurate enough for most uses.

Is sum() buggy? No, of course not. It does what it promises, it's just that
what it promises to do falls short of "calculate floating point summations
to high accuracy".

Now, here's something which *would* be a bug, if sum() did it:

class MyInt(int):
def __add__(self, other):
return MyInt(super(MyInt, self).__add__(other))
def __radd__(self, other):
return MyInt(super(MyInt, self).__radd__(other))
def __repr__(self):
return "MyInt(%d)" % self


Adding a zero MyInt to an int gives a MyInt:

py> MyInt(0) + 23
MyInt(23)

so sum() should do the same thing. If it didn't, if it optimised away the
actual addition because "adding zero to a number can't change anything", it
would be buggy. But in fact, sum() does the right thing:

py> sum([MyInt(0), 23])
MyInt(23)


> I think by definition you can
> always describe it that way, you just make "what counts as
> correctness" be "what the customer wants given the resources
> available".

Not quite. "Correct" means "does what the customer wants". Or if there is no
customer, it's "does what you say it will do".

How do we tell when software is buggy? We compare what it actually does to
the promised behaviour, or expected behaviour, and if there is a
discrepancy, we call it a bug. We don't compare it to some ideal that
cannot be met. A bug report that math.pi does not have infinite number of
decimal places would be closed as "Will Not Fix".

Likewise, if your customer pays you to solve the Travelling Salesman Problem
exactly, even if it takes a week to calculate, then nothing short of a
program that solves the Travelling Salesman Problem exactly will satisfy
their requirements. It's no good telling the customer that you can
calculate a non-optimal answer twenty times faster if they want the actual
optimal answer.

(Of course, you may try to persuade them that they don't really need the
optimal solution, or that they cannot afford it, or that you cannot deliver
and they need to compromise.)


> The conventional definition, however, is "what the
> customer wants, imagining that you have infinite resources".

I don't think the resources really come into it. At least, certainly not
*infinite* resources. fsum() doesn't require infinite resources to
calculate floating point summations to high accuracy. An even more accurate
(but even slower) version would convert each float into a Fraction, then
add the Fractions.


> With just
> a little redefinition that seems reasonable, you can be made never to
> be wrong!

I'm never wrong because I'm always right! *wink*

Let's bring this back to the claim made at the beginning. Someone (Mark?)
made a facetious comment about preferring fast code to correct code.
Someone else (I forget who, and am too lazy to look it up -- Roy Smith
perhaps?) suggested that we accept incorrect code if it is fast quite
often. But I maintain that we don't. If we did, we'd explicitly say:

"Sure, I know this program calculates the wrong answer, but gosh look how
fast it is!"

much like a anecdote I gave about the roadie driving in the wrong direction
who stated "Who cares, we're making great time!".

I maintain that people don't as a rule justify incorrect code on the basis
of it being fast. They claim the code isn't incorrect, that any compromises
made are deliberate and not bugs:

- "sum() doesn't promise to calculate floats to high accuracy, it
promises to give the same answer as if you repeatedly added them
with the + operator."

- "We never promised 100% uptime, we promised four nines uptime."

- "Our anti-virus scanner is blindingly fast, while still identifying
at least 99% of all known computer viruses!"

- "The Unix 'locate' command doesn't do a live search of the file
system because that would be too slow, it uses a snapshot of the
state of the file system."


Is locate buggy because it tells you what files existed the last time the
updatedb command ran, instead of what files exist right now? No, of course
not. locate does exactly what it promises to do.


--
Steven

Chris Angelico

unread,
Jan 5, 2014, 7:48:14 AM1/5/14
to pytho...@python.org
Even more strongly: We say colloquially that Google, DuckDuckGo, etc,
etc, are tools for "searching the web". But they're not. They're tools
for *indexing* the World Wide Web, and then searching that index. It's
plausible to actually search your file system (and there are times
when you want that), but completely implausible to search the (F or
otherwise) web. We accept the delayed appearance of a page in the
search results because we want immediate results, no waiting a month
to find anything! So the difference between what's technically
promised and what's colloquially described may be more than just
concealing bugs - it may be the vital difference between uselessness
and usefulness. And yet we like the handwave.

ChrisA

Mark Lawrence

unread,
Jan 5, 2014, 8:10:26 AM1/5/14
to pytho...@python.org
On 31/12/2013 09:53, Steven D'Aprano wrote:
> Mark Lawrence wrote:
>
>> http://blog.startifact.com/posts/alex-gaynor-on-python-3.html.
>
> I quote:
>
> "...perhaps a brave group of volunteers will stand up and fork Python 2, and
> take the incremental steps forward. This will have to remain just an idle
> suggestion, as I'm not volunteering myself."
>
> I expect that as excuses for not migrating get fewer, and the deadline for
> Python 2.7 end-of-life starts to loom closer, more and more haters^W
> Concerned People will whine about the lack of version 2.8 and ask for
> *somebody else* to fork Python.
>

Should the "somebody else" fork Python, in ten (ish) years time the
Concerned People will be complaining that they can't port their code to
Python 4 and will "somebody else" please produce version 2.9.

Stefan Behnel

unread,
Jan 5, 2014, 8:55:46 AM1/5/14
to pytho...@python.org
Johannes Bauer, 05.01.2014 13:14:
> I've pushed the
> migration of *large* projects at work to Python3 when support was pretty
> early and it really wasn't a huge deal.

I think there are two sides to consider. Those who can switch their code
base to Py3 and be happy (as you did, apparently), and those who cannot
make the switch but have to keep supporting Py2 until 'everyone' else has
switched, too. The latter is a bit more work generally and applies mostly
to Python packages on PyPI, i.e. application dependencies.

There are two ways to approach that problem. One is to try convincing
people that "Py3 has failed, let's stop migrating more code before I have
to start migrating mine", and the other is to say "let's finish the
migration and get it done, so that we can finally drop Py2 support in our
new releases and clean up our code again".

As long as we stick in the middle and keep the status quo, we keep the
worst of both worlds. And, IMHO, pushing loudly for a Py2.8 release
provides a very good excuse for others to not finish their part of the
migration, thus prolonging the maintenance burden for those who already did
their share.

Maybe a couple of major projects should start dropping their Py2 support,
just to make their own life easier and to help others in taking their
decision, too.

(And that's me saying that, who maintains two major projects that still
have legacy support for Py2.4 ...)

Stefan


wxjm...@gmail.com

unread,
Jan 5, 2014, 9:23:30 AM1/5/14
to
My examples are ONLY ILLUSTRATING, this FSR
is wrong by design, can be on the side of
memory, performance, linguistic or even
typography.

I will not refrain you to waste your time
in adjusting bytes, if the problem is not
on that side.

jmf

Ned Batchelder

unread,
Jan 5, 2014, 10:20:11 AM1/5/14
to pytho...@python.org
On 1/5/14 9:23 AM, wxjm...@gmail.com wrote:
> Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a �crit :
JMF: this has been pointed out to you time and again: the flexible
string representation is not wrong. To show that it is wrong, you would
have to demonstrate some semantic of Unicode that is violated. You have
never done this. You've picked pathological cases and shown
micro-timing output, and memory usage. The Unicode standard doesn't
promise anything about timing or memory use.

The FSR makes a trade-off of time and space. Everyone but you considers
it a good trade-off. I don't think you are showing real use cases, but
if they are, I'm sorry that your use-case suffers. That doesn't make
the FSR wrong. The most accurate statement is that you don't like the
FSR. That's fine, you're entitled to your opinion.

You say the FSR is wrong linguistically. This can't be true, since an
FSR Unicode string is indistinguishable from an internally-UTF-32
Unicode string, and no, memory use or timings are irrelevant when
discussing the linguistic performance of a Unicode string.

You've also said that the internal representation of the FSR is
incorrect because of encodings somehow. Encodings have nothing to do
with the internal representation of a Unicode string, they are for
interchanging data. You seem to know a lot about Unicode, but when you
make this fundamental mistake, you call all of your expertise into question.

To re-iterate what you are doing wrong:

1) You continue to claim things that are not true, and that you have
never substantiated.

2) You paste code samples without accompanying text that explain what
you are trying to demonstrate.

3) You ignore refutations that disprove your points.

These are all the behaviors of a troll. Please stop.

If you want to discuss the details of Unicode implementations, I'd
welcome an offlist discussion, but only if you will approach it honestly
enough to leave open the possibility that you are wrong. I know I would
be glad to learn details of Unicode that I have missed, but so far you
haven't provided any.

--Ned.

>
> I will not refrain you to waste your time
> in adjusting bytes, if the problem is not
> on that side.
>
> jmf
>


Roy Smith

unread,
Jan 5, 2014, 11:10:56 AM1/5/14
to
In article <52c94fec$0$29973$c3e8da3$5496...@news.astraweb.com>,
Steven D'Aprano <steve+comp....@pearwood.info> wrote:

> How do we tell when software is buggy? We compare what it actually does to
> the promised behaviour, or expected behaviour, and if there is a
> discrepancy, we call it a bug. We don't compare it to some ideal that
> cannot be met. A bug report that math.pi does not have infinite number of
> decimal places would be closed as "Will Not Fix".

That's because it is inherently impossible to "fix" that. But lots of
bug reports legitimately get closed with "Will Not Fix" simply because
the added value from fixing it doesn't justify the cost (whether in
terms of development effort, or run-time resource consumption).

Go back to the package sorting example I gave. If the sorting software
mis-reads the address and sends my package to Newark instead of New York
by mistake, that's clearly a bug.

Presumably, it's an error which could be eliminated (or, at least, the
rate of occurrence reduced) by using a more sophisticated OCR algorithm.
But, if those algorithms take longer to run, the overall expected value
of implementing the bug fix software may well be negative.

In the real world, nobody cares if software is buggy. They care that it
provides value.

Roy Smith

unread,
Jan 5, 2014, 11:34:31 AM1/5/14
to
In article <mailman.4930.1388908...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:

> On Sun, Jan 5, 2014 at 2:20 PM, Roy Smith <r...@panix.com> wrote:
> > I've got a new sorting algorithm which is guaranteed to cut 10 seconds
> > off the sorting time (i.e. $0.10 per package). The problem is, it makes
> > a mistake 1% of the time.
>
> That's a valid line of argument in big business, these days, because
> we've been conditioned to accept low quality. But there are places
> where quality trumps all, and we're happy to pay for that. Allow me to
> expound two examples.
>
> 1) Amazon
>
> http://www.amazon.com/exec/obidos/ASIN/1782010165/evertype-20
>
> I bought this book a while ago. It's about the size of a typical
> paperback. It arrived in a box too large for it on every dimension,
> with absolutely no packaging. I complained. Clearly their algorithm
> was: "Most stuff will get there in good enough shape, so people can't
> be bothered complaining. And when they do complain, it's cheaper to
> ship them another for free than to debate with them on chat."

You're missing my point.

Amazon's (short-term) goal is to increase their market share by
undercutting everybody on price. They have implemented a box-packing
algorithm which clearly has a bug in it. You are complaining that they
failed to deliver your purchase in good condition, and apparently don't
care. You're right, they don't. The cost to them to manually correct
this situation exceeds the value. This is one shipment. It doesn't
matter. You are one customer, you don't matter either. Seriously.
This may be annoying to you, but it's good business for Amazon. For
them, fast and cheap is absolutely better than correct.

I'm not saying this is always the case. Clearly, there are companies
which have been very successful at producing a premium product (Apple,
for example). I'm not saying that fast is always better than correct.
I'm just saying that correct is not always better than fast.

Chris Angelico

unread,
Jan 5, 2014, 11:51:20 AM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 3:34 AM, Roy Smith <r...@panix.com> wrote:
> Amazon's (short-term) goal is to increase their market share by
> undercutting everybody on price. They have implemented a box-packing
> algorithm which clearly has a bug in it. You are complaining that they
> failed to deliver your purchase in good condition, and apparently don't
> care. You're right, they don't. The cost to them to manually correct
> this situation exceeds the value. This is one shipment. It doesn't
> matter.

If it stopped there, it would be mildly annoying ("1% of our shipments
will need to be replaced, that's a 1% cost for free replacements").
The trouble is that they don't care about the replacement either, so
it's really that 100% (or some fairly large proportion) of their
shipments will arrive with some measure of damage, and they're hoping
that their customers' threshold for complaining is often higher than
the damage sustained. Which it probably is, a lot of the time.

> You are one customer, you don't matter either. Seriously.
> This may be annoying to you, but it's good business for Amazon. For
> them, fast and cheap is absolutely better than correct.

But this is the real problem, business-wise. Can you really run a
business by not caring about your customers? (I also think it's pretty
disappointing that a business like Amazon can't just toss in some
bubbles, or packing peanuts (what we call "trucks" for hysterical
raisins), or something. It's not that hard to have a machine just blow
in some sealed air before the box gets closed... surely?) Do they have
that much of a monopoly, or that solid a customer base, that they're
happy to leave *everyone* dissatisfied? We're not talking about 1%
here. From the way the cust svc guy was talking, I get the impression
that they do this with all parcels.

And yet.... I can't disagree with your final conclusion. Empirical
evidence goes against my incredulous declaration that "surely this is
a bad idea" - according to XKCD 1165, they're kicking out nearly a
cubic meter a *SECOND* of packages. That's fairly good evidence that
they're doing something that, whether it be right or wrong, does fit
with the world's economy. Sigh.

ChrisA

Roy Smith

unread,
Jan 5, 2014, 12:09:50 PM1/5/14
to
Chris Angelico <ros...@gmail.com> wrote:

> Can you really run a business by not caring about your customers?

http://snltranscripts.jt.org/76/76aphonecompany.phtml
Message has been deleted
Message has been deleted

Terry Reedy

unread,
Jan 5, 2014, 5:14:07 PM1/5/14
to pytho...@python.org
On 1/5/2014 9:23 AM, wxjm...@gmail.com wrote:
> Le samedi 4 janvier 2014 23:46:49 UTC+1, Terry Reedy a écrit :
>> On 1/4/2014 2:10 PM, wxjm...@gmail.com wrote:
>>> And I could add, I *never* saw once one soul, who is
>>> explaining what I'm doing wrong in the gazillion
>>> of examples I gave on this list.

>> If this is true, it is because you have ignored and not read my
>> numerous, relatively polite posts. To repeat very briefly:
>> 1. Cherry picking (presenting the most extreme case as representative).
>> 2. Calling space saving a problem (repeatedly).
>> 3. Ignoring bug fixes.
...

> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design, can be on the side of
> memory, performance, linguistic or even
> typography.

Let me expand on 3 of my points. First, performance == time:

Point 3. You correctly identified a time regression in finding a
character in a string. I saw that the slowdown was *not* inherent in the
FSR but had to be a glitch in the code, and reported it on pydev with
the hope that someone would fix it even if it were not too important in
real use cases. Someone did.

Point 1. You incorrectly generalized that extreme case. I reported (a
year ago last September) that the overall stringbench results were about
the same. I also pointed out that there is an equally non-representative
extreme case in the opposite direction, and that it would equally be
wrong of me to use that to claim that FSR is faster. (It turns out that
this FSR speed advantage *is* inherent in the design.)

Memory: Point 2. A *design goal* of FSR was to save memory relative to
UTF-32, which is what you apparently prefer. Your examples show that FSF
successfully met its design goal. But you call that success, saving
memory, 'wrong'. On what basis?

You *claim* the FSR is 'wrong by design', but your examples only show
that is was temporarily wrong in implementation as far as speed and
correct by design as far as memory goes.

--
Terry Jan Reedy


Terry Reedy

unread,
Jan 5, 2014, 5:48:43 PM1/5/14
to pytho...@python.org
On 1/5/2014 9:23 AM, wxjm...@gmail.com wrote:

> My examples are ONLY ILLUSTRATING, this FSR
> is wrong by design,

Let me answer you a different way. If FSR is 'wrong by design', so are
the alternatives. Hence, the claim is, in itself, useless as a guide to
choosing. The choices:

* Keep the previous complicated system of buggy narrow builds on some
systems and space-wasting wide builds on other systems, with Python code
potentially acting differently on the different builds. I am sure that
you agree that this is a bad design.

* Improved the dual-build system by de-bugging narrow builds. I proposed
to do this (and gave Python code proving the idea) by adding the
complication of an auxiliary array of indexes of astral chars in a
UTF-16 string. I suspect you would call this design 'wrong' also.

* Use the memory-wasting UTF-32 (wide) build on all systems. I know you
do not consider this 'wrong', but come on. From an information theoretic
and coding viewpoint, it clearly is. The top (4th) byte is *never* used.
The 3rd byte is *almost never* used. The 2nd byte usage ranges from
common to almost never for different users.

Memory waste is also time waste, as moving information-free 0 bytes
takes the same time as moving informative bytes.

Here is the beginning of the rationale for the FSR (from
http://www.python.org/dev/peps/pep-0393/ -- have you ever read it?).

"There are two classes of complaints about the current implementation of
the unicode type: on systems only supporting UTF-16, users complain that
non-BMP characters are not properly supported. On systems using UCS-4
internally (and also sometimes on systems using UCS-2), there is a
complaint that Unicode strings take up too much memory - especially
compared to Python 2.x, where the same code would often use ASCII
strings...".

The memory waste was a reason to stick with 2.7. It could break code
that worked in 2.x. By removing the waste, the FSR makes switching to
Python 3 more feasible for some people. It was a response to real
problems encountered by real people using Python. It fixed both classes
of complaint about the previous system.

* Switch to the time-wasting UTF-8 for text storage, as some have done.
This is different from using UTF-8 for text transmission, which I hope
becomes the norm soon.

--
Terry Jan Reedy

Terry Reedy

unread,
Jan 5, 2014, 5:56:07 PM1/5/14
to pytho...@python.org
On 1/5/2014 11:51 AM, Chris Angelico wrote:
> On Mon, Jan 6, 2014 at 3:34 AM, Roy Smith <r...@panix.com> wrote:
>> Amazon's (short-term) goal is to increase their market share by
>> undercutting everybody on price. They have implemented a box-packing
>> algorithm which clearly has a bug in it. You are complaining that they
>> failed to deliver your purchase in good condition, and apparently don't
>> care. You're right, they don't. The cost to them to manually correct
>> this situation exceeds the value. This is one shipment. It doesn't
>> matter.
>
> If it stopped there, it would be mildly annoying ("1% of our shipments
> will need to be replaced, that's a 1% cost for free replacements").
> The trouble is that they don't care about the replacement either, so
> it's really that 100% (or some fairly large proportion) of their
> shipments will arrive with some measure of damage, and they're hoping
> that their customers' threshold for complaining is often higher than
> the damage sustained. Which it probably is, a lot of the time.

My wife has gotten several books from Amazon and partners and we have
never gotten one loose enough in a big enough box to be damaged. Either
the box is tight or has bubble packing. Leaving aside partners, maybe
distribution centers have different rules.

--
Terry Jan Reedy

Chris Angelico

unread,
Jan 5, 2014, 6:59:59 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 9:56 AM, Terry Reedy <tjr...@udel.edu> wrote:
> On 1/5/2014 11:51 AM, Chris Angelico wrote:
>>
>> On Mon, Jan 6, 2014 at 3:34 AM, Roy Smith <r...@panix.com> wrote:
>>>
>>> Amazon's (short-term) goal is to increase their market share by
>>> undercutting everybody on price. They have implemented a box-packing
>>> algorithm which clearly has a bug in it. You are complaining that they
>>> failed to deliver your purchase in good condition, and apparently don't
>>> care. You're right, they don't. The cost to them to manually correct
>>> this situation exceeds the value. This is one shipment. It doesn't
>>> matter.
>>
>>
>> If it stopped there, it would be mildly annoying ("1% of our shipments
>> will need to be replaced, that's a 1% cost for free replacements").
>> The trouble is that they don't care about the replacement either, so
>> it's really that 100% (or some fairly large proportion) of their
>> shipments will arrive with some measure of damage, and they're hoping
>> that their customers' threshold for complaining is often higher than
>> the damage sustained. Which it probably is, a lot of the time.
>
>
> My wife has gotten several books from Amazon and partners and we have never
> gotten one loose enough in a big enough box to be damaged. Either the box is
> tight or has bubble packing. Leaving aside partners, maybe distribution
> centers have different rules.

Or possibly (my personal theory) the CS rep I was talking to just
couldn't be bothered solving the problem. Way way too much work to
make the customer happy, much easier and cheaper to give a 30% refund
and hope that shuts him up.

But they managed to ship two books (the original and the replacement)
with insufficient packaging. Firstly, that requires the square of the
probability of failure; and secondly, if you care even a little bit
about making your customers happy, put a little note on the second
order instructing people to be particularly careful of this one! Get
someone to check it before it's sent out. Make sure it's right this
time. I know that's what we used to do in the family business whenever
anything got mucked up.

(BTW, I had separately confirmed that the problem was with Amazon, and
not - as has happened to me with other shipments - caused by
Australian customs officials opening the box, looking through it, and
then packing it back in without its protection. No, it was shipped
that way.)

Anyway, this is veering so far off topic that we're at no risk of
meeting any Python Alliance ships - as Mal said, we're at the corner
of No and Where. But maybe someone can find an on-topic analogy to put
some tentative link back into this thread...

ChrisA

Steven D'Aprano

unread,
Jan 5, 2014, 7:42:14 PM1/5/14
to
Chris Angelico wrote about Amazon:

> And yet.... I can't disagree with your final conclusion. Empirical
> evidence goes against my incredulous declaration that "surely this is
> a bad idea" - according to XKCD 1165, they're kicking out nearly a
> cubic meter a SECOND of packages.

Yes, but judging by what you described as their packing algorithm that's
probably only a tenth of a cubic metre of *books*, the rest being empty box
for the book to rattle around in and get damaged.

--
Steven

Steven D'Aprano

unread,
Jan 5, 2014, 8:23:15 PM1/5/14
to
One, you're missing my point that to Amazon, "fast and cheap" *is* correct.
They would not agree with you that their box-packing algorithm is buggy, so
long as their customers don't punish them for it. It meets their
requirements: ship parcels as quickly as possible, and push as many of the
costs (damaged books) onto the customer as they can get away with. If they
thought it was buggy, they would be trying to fix it.

Two, nobody is arguing against the concept that different parties have
different concepts of what's correct. To JMF, the flexible string
representation is buggy, because he's detected a trivially small slowdown
in some artificial benchmarks. To everyone else, it is not buggy, because
it does what it sets out to do: save memory while still complying with the
Unicode standard. A small slowdown on certain operations is a cost worth
paying.

Normally, the definition of "correct" that matters is that belonging to the
paying customer, or failing that, the programmer who is giving his labour
away for free. (Extend this out to more stakeholders if you wish, but the
more stakeholders you include, the harder it is to get consensus on what's
correct and what isn't.) From the perspective of Amazon's customers,
presumably so long as the cost of damaged and lost books isn't too high,
they too are willing to accept Amazon's definition of "correct" in order to
get cheap books, or else they would buy from someone else.

(However, to the extent that Amazon has gained monopoly power over the book
market, that reasoning may not apply. Amazon is not *technically* a
monopoly, but they are clearly well on the way to becoming one, at which
point the customer has no effective choice and the market is no longer
free.)

The Amazon example is an interesting example of market failure, in the sense
that the free market provides a *suboptimal solution* to a problem. We'd
all like reasonably-priced books AND reliable delivery, but maybe we can't
have both. Personally, I'm not so sure about that. Maybe Jeff Bezos could
make do with only five solid gold Mercedes instead of ten[1], for the sake
of improved delivery? But apparently not.

But I digress... ultimately, you are trying to argue that there is a single
absolute source of truth for what counts as "correct". I don't believe
there is. We can agree that some things are clearly not correct -- Amazon
takes your money and sets the book on fire, or hires an armed military
escort costing $20 million a day to deliver your book of funny cat
pictures. We might even agree on what we'd all like in a perfect world:
cheap books, reliable delivery, and a pony. But in practice we have to
choose some features over others, and compromise on requirements, and
ultimately we have to make a *pragmatic* choice on what counts as correct
based on the functional requirements, not on a wishlist of things we'd like
with infinite time and money.

Sticking to the Amazon example, what percentage of books damaged in delivery
ceases to be a bug in the packing algorithm and becomes "just one of those
things"? One in ten? One in ten thousand? One in a hundred billion billion?
I do not accept that "book gets damaged in transit" counts as a bug. "More
than x% of books get damaged", that's a bug. "Average cost to ship a book
is more than $y" is a bug. And Amazon gets to decide what the values of x%
and $y are.


> I'm not saying this is always the case. Clearly, there are companies
> which have been very successful at producing a premium product (Apple,
> for example). I'm not saying that fast is always better than correct.
> I'm just saying that correct is not always better than fast.

In the case of Amazon, "correct" in the sense of "books are packed better"
is not better than fast. It's better for the customer, and better for
society as a whole (less redundant shipping and less ecological harm), but
not better for Amazon. Since Amazon gets to decide what's better, their
greedy, short-term outlook wins, at least until such time as customers find
an alternative. Amazon would absolutely not agree with you that packing the
books more securely is "better", if they did, they would do it. They're not
stupid, just focused on short-term gain for themselves at the expense of
everyone else. (Perhaps a specialised, and common, form of stupidity.)

By the way, this whole debate is known as "Worse is better", and bringing it
back to programming languages and operating systems, you can read more
about it here:

http://www.jwz.org/doc/worse-is-better.html



[1] Figuratively speaking.


--
Steven

Chris Angelico

unread,
Jan 5, 2014, 8:54:48 PM1/5/14
to pytho...@python.org
On Mon, Jan 6, 2014 at 12:23 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> (However, to the extent that Amazon has gained monopoly power over the book
> market, that reasoning may not apply. Amazon is not *technically* a
> monopoly, but they are clearly well on the way to becoming one, at which
> point the customer has no effective choice and the market is no longer
> free.)

They don't need a monopoly on the whole book market, just on specific
books - which they did have, in the cited case. I actually asked the
author (translator, really - it's a translation of "Alice in
Wonderland") how he would prefer me to buy, as there are some who sell
on Amazon and somewhere else. There was no alternative to Amazon, ergo
no choice and the market was not free. Like so many things, one choice
("I want to buy Ailice's Anters in Ferlielann") mandates another
("Must buy through Amazon").

I don't know what it cost Amazon to ship me two copies of a book, but
still probably less than they got out of me, so they're still ahead.
Even if they lost money on this particular deal, they're still way
ahead because of all the people who decide it's not worth their time
to spend an hour or so trying to get a replacement. So yep, this
policy is serving Amazon fairly well.

ChrisA

Mark Lawrence

unread,
Jan 6, 2014, 12:53:12 AM1/6/14
to pytho...@python.org
On 06/01/2014 01:54, Chris Angelico wrote:
> On Mon, Jan 6, 2014 at 12:23 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> (However, to the extent that Amazon has gained monopoly power over the book
>> market, that reasoning may not apply. Amazon is not *technically* a
>> monopoly, but they are clearly well on the way to becoming one, at which
>> point the customer has no effective choice and the market is no longer
>> free.)
>
> They don't need a monopoly on the whole book market, just on specific
> books - which they did have, in the cited case. I actually asked the
> author (translator, really - it's a translation of "Alice in
> Wonderland") how he would prefer me to buy, as there are some who sell
> on Amazon and somewhere else. There was no alternative to Amazon, ergo
> no choice and the market was not free. Like so many things, one choice
> ("I want to buy Ailice's Anters in Ferlielann") mandates another
> ("Must buy through Amazon").
>
> I don't know what it cost Amazon to ship me two copies of a book, but
> still probably less than they got out of me, so they're still ahead.
> Even if they lost money on this particular deal, they're still way
> ahead because of all the people who decide it's not worth their time
> to spend an hour or so trying to get a replacement. So yep, this
> policy is serving Amazon fairly well.
>
> ChrisA
>

So much for my "You never know, we might even end up with a thread
whereby the discussion is Python, the whole Python and nothing but the
Python." :)

wxjm...@gmail.com

unread,
Jan 7, 2014, 8:34:36 AM1/7/14
to
Point 3: You are right. I'm very happy to agree.

Point 2: This Flexible String Representation does no
"effectuate" any memory optimization. It only succeeds
to do the opposite of what a corrrect usage of utf*
do.

Ned : this has already been explained and illustrated.

jmf

Terry Reedy

unread,
Jan 7, 2014, 9:54:18 AM1/7/14
to pytho...@python.org
On 1/7/2014 8:34 AM, wxjm...@gmail.com wrote:
> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :

>> Memory: Point 2. A *design goal* of FSR was to save memory relative to
>> UTF-32, which is what you apparently prefer. Your examples show that FSF
>> successfully met its design goal. But you call that success, saving
>> memory, 'wrong'. On what basis?

> Point 2: This Flexible String Representation does no
> "effectuate" any memory optimization. It only succeeds
> to do the opposite of what a corrrect usage of utf*
> do.

Since the FSF *was* successful in saving memory, and indeed shrank the
Python binary by about a megabyte, I have no idea what you mean.

--
Terry Jan Reedy


Tim Delaney

unread,
Jan 7, 2014, 5:38:51 PM1/7/14
to wxjm...@gmail.com, Python-List
On 8 January 2014 00:34, <wxjm...@gmail.com> wrote:

Point 2: This Flexible String Representation does no
"effectuate" any memory optimization. It only succeeds
to do the opposite of what a corrrect usage of utf*
do.

UTF-8 is a variable-width encoding that uses less memory to encode code points with lower numerical values, on a per-character basis e.g. if a code point <= U+007F it will use a single byte to encode; if <= U+07FF two bytes will be used; ... up to a maximum of 6 bytes for code points >= U+4000000.

FSR is a variable-width memory structure that uses the width of the code point with the highest numerical value in the string e.g. if all code points in the string are <= U+00FF a single byte will be used per character; if all code points are <= U+FFFF two bytes will be used per character; and in all other cases 4 bytes will be used per character.

In terms of memory usage the difference is that UTF-8 varies its width per-character, whereas the FSR varies its width per-string. For any particular string, UTF-8 may well result in using less memory than the FSR, but in other (quite common) cases the FSR will use less memory than UTF-8 e.g. if the string contains only contains code points <= U+00FF, but some are between U+0080 and U+00FF (inclusive).

In most cases the FSR uses the same or less memory than earlier versions of Python 3 and correctly handles all code points (just like UTF-8). In the cases where the FSR uses more memory than previously, the previous behaviour was incorrect.

No matter which representation is used, there will be a certain amount of overhead (which is the majority of what most of your examples have shown). Here are examples which demonstrate cases where UTF-8 uses less memory, cases where the FSR uses less memory, and cases where they use the same amount of memory (accounting for the minimum amount of overhead required for each).

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>>
>>> fsr = u""
>>> utf8 = fsr.encode("utf-8")
>>> min_fsr_overhead = sys.getsizeof(fsr)
>>> min_utf8_overhead = sys.getsizeof(utf8)
>>> min_fsr_overhead
49
>>> min_utf8_overhead
33
>>>
>>> fsr = u"\u0001" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1000
>>> sys.getsizeof(utf8) - min_utf8_overhead
1000
>>>
>>> fsr = u"\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
1024
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0001\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2024
>>> sys.getsizeof(utf8) - min_utf8_overhead
3000
>>>
>>> fsr = u"\u0101" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
2025
>>> sys.getsizeof(utf8) - min_utf8_overhead
2000
>>>
>>> fsr = u"\u0101\u0081" * 1000
>>> utf8 = fsr.encode("utf-8")
>>> sys.getsizeof(fsr) - min_fsr_overhead
4025
>>> sys.getsizeof(utf8) - min_utf8_overhead
4000

Indexing a character in UTF-8 is O(N) - you have to traverse the the string up to the character being indexed. Indexing a character in the FSR is O(1). In all cases the FSR has better performance characteristics for indexing and slicing than UTF-8.

There are tradeoffs with both UTF-8 and the FSR. The Python developers decided the priorities for Unicode handling in Python were:

1. Correctness
  a. all code points must be handled correctly;
  b.  it must not be possible to obtain part of a code point (e.g. the first byte only of a multi-byte code point);

2. No change in the Big O characteristics of string operations e.g. indexing must remain O(1);

3. Reduced memory use in most cases.

It is impossible for UTF-8 to meet both criteria 1b and 2 without additional auxiliary data (which uses more memory and increases complexity of the implementation). The FSR meets all 3 criteria.

Tim Delaney 

Terry Reedy

unread,
Jan 7, 2014, 7:02:22 PM1/7/14
to pytho...@python.org
On 1/7/2014 9:54 AM, Terry Reedy wrote:
> On 1/7/2014 8:34 AM, wxjm...@gmail.com wrote:
>> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a écrit :
>
>>> Memory: Point 2. A *design goal* of FSR was to save memory relative to
>>> UTF-32, which is what you apparently prefer. Your examples show that FSF
>>> successfully met its design goal. But you call that success, saving
>>> memory, 'wrong'. On what basis?
>
>> Point 2: This Flexible String Representation does no
>> "effectuate" any memory optimization. It only succeeds
>> to do the opposite of what a corrrect usage of utf*
>> do.
>
> Since the FSF *was* successful in saving memory, and indeed shrank the
> Python binary by about a megabyte, I have no idea what you mean.

Tim Delaney apparently did, and answered on the basis of his
understanding. Note that I said that the design goal was 'save memory
RELATIVE TO UTF-32', not 'optimize memory'. UTF-8 was not considered an
option. Nor was any form of arithmetic coding
https://en.wikipedia.org/wiki/Arithmetic_coding
to truly 'optimize memory'.

--
Terry Jan Reedy


wxjm...@gmail.com

unread,
Jan 8, 2014, 4:59:48 AM1/8/14
to
The FSR acts more as an coding scheme selector than
as a code point optimizer.

Claiming that it saves memory is some kind of illusion;
a little bit as saying "Py2.7 uses "relatively" less memory than
Py3.2 (UCS-2)".

>>> sys.getsizeof('a' * 10000 + 'z')
10026
>>> sys.getsizeof('a' * 10000 + '€')
20040
>>> sys.getsizeof('a' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('€' * 10000 + '€')
20040
>>> sys.getsizeof('€' * 10000 + '\U00010000')
40044
>>> sys.getsizeof('\U00010000' * 10000 + '\U00010000')
40044

jmf

Terry Reedy

unread,
Jan 8, 2014, 2:26:55 PM1/8/14
to pytho...@python.org
On 1/8/2014 4:59 AM, wxjm...@gmail.com wrote:
[responding to me]
> The FSR acts more as an coding scheme selector

That is what PEP 393 describes and what I and many others have said. The
FSR saves memory by selecting from three choices the most compact coding
scheme for each string.

I ask again, have you read PEP 393? If you are going to critique the
FSR, you should read its basic document.

> than as a code point optimizer.

I do not know what you mean by 'code point optimizer'.

> Claiming that it saves memory is some kind of illusion;

Do you really think that the mathematical fact "10026 < 20040 < 40044"
(from your example below) is some kind of illusion? If so, please take
your claim to a metaphysics list. If not, please stop trolling.

> a little bit as saying "Py2.7 uses "relatively" less memory than
> Py3.2 (UCS-2)".

This is inane as 2.7 and 3.2 both use the same two coding schemes.
Saying '1 < 2' is different from saying '2 < 2'.

On 3.3+
>>>> sys.getsizeof('a' * 10000 + 'z')
> 10026
>>>> sys.getsizeof('a' * 10000 + '€')
> 20040
>>>> sys.getsizeof('a' * 10000 + '\U00010000')
> 40044

3.2- wide (UCS-4) builds use about 40050 bytes for all three unicode
strings. One again, you have posted examples that show how FSR saves
memory, thus negating your denial of the saving.

--
Terry Jan Reedy


Mark Lawrence

unread,
Jan 8, 2014, 3:04:06 PM1/8/14
to pytho...@python.org
On 07/01/2014 13:34, wxjm...@gmail.com wrote:
> Le dimanche 5 janvier 2014 23:14:07 UTC+1, Terry Reedy a �crit :
>
> Ned : this has already been explained and illustrated.
>
> jmf
>

This has never been explained and illustrated. Roughly 30 minutes ago
Terry Reedy once again completely shot your argument about memory usage
to pieces. You did not bother to respond to the comments from Tim
Delaney made almost one day ago. Please give up.
0 new messages