Dictionaries

Sebastien Loisel

unread,

Feb 15, 1997, 3:00:00 AM2/15/97

to

Is there a particular reason why dictionaries were implemented using
hashing tables instead of AVL trees? I've never looked at the code, but
it appears to me that this would've been more scalable and at least as
efficient O() wise. Perhaps in practice this is slower?

Sebastien Loisel

Andrew P. Mullhaupt

unread,

Feb 16, 1997, 3:00:00 AM2/16/97

to

Hashing is normally a lot faster than self-balancing trees unless there
is catastrophic clustering of the keys. With a hash function that is
doing its job, then hashing is essentially O(1).

Later,
Andrew Mullhaupt

Tim Peters

unread,

Feb 16, 1997, 3:00:00 AM2/16/97

to

> [Sebastien Loisel asks]

> Is there a particular reason why dictionaries were implemented
> using hashing tables instead of AVL trees?

Most likely because Guido worked on the implementation of the ABC language,
which did use balanced trees <0.5 wink>.

> ...

> it appears to me that this would've been more scalable

? Both will grow to use all the memory you have, if necessary, but the space
overhead of storing left & right links plus a balancing tag in each node is at
least as great as the space overhead in the hash table, and probably greater.

> and at least as efficient O() wise.

Balanced trees have excellent worst-case behavior, but suffer for a variety of
reasons in expected-case behavior (most damningly that expected search time is
constant for Python's flavor of hash table, and logarithmic for balanced
trees).

> Perhaps in practice this is slower?

In practice that's right, if you believe that for typical applications lookup
speed is overwhelmingly the most important thing. Besides the O()
considerations, two things really hurt trees in practice:

1) The algorithms are complicated, which leads to relatively large constant
overheads even in careful implementations, which makes trees sluggish before
they get large -- and most dicts are pretty small!

2) Because a tree implements a total ordering, it needs to do a full compare
at each node to decide whether to stop, "go left" or "go right". Python's
hash table only needs to resolve equality, and caches an object's hash code so
that most of the time inequality can be established with a single inline int
comparison; a full compare requires at least one external call (due to
Python's OO nature), so that a single full compare is as expensive as a few
dozen inequality tests.

ABC used balanced trees for dicts and lists, and that at least took advantage
of trees' abilities to handle ordinal indexing, insertion-at-an-index, sorting
and catenation efficiently too. Python uses different & simpler
implementations for dicts and lists, which buys much better performance for
the most typical uses at the cost of potentially much worse performance for
oddball and worst-case uses (list catenation is linear-time in Python; dict
lookup *may* be linear-time in rare cases; list sorting *may* be
quadratic-time in rare cases; and-- you guessed it --nobody cares <wink>).

if-quicksort-didn't-exist-guido-would-invent-it-ly y'rs - tim

Tim Peters tim...@msn.com, t...@dragonsys.com
not speaking for Dragon Systems Inc.

Sebastien Loisel

unread,

Feb 17, 1997, 3:00:00 AM2/17/97

to

Tim Peters wrote:
> lookup *may* be linear-time in rare cases; list sorting *may* be
> quadratic-time in rare cases; and-- you guessed it --nobody cares

I surmise that Python uses qsort then, for sorting lists. Anyone actually
knows what's the (supposedly constant factor) difference between a good
qsort implementation and a good merge sort implementation (or any of your
favorite "optimal" comparison sort).

Insofar, whenever I had to sort something, I either used Radix sort (for
linear time) when I could (eg, C ints or floats), qsort when I needed a
comparison sort, using the C implementation, or merge sort when I had to
code my own comparison sort (I did that in Lisp - don't ask me to compare
it with a C qsort). Thus, I don't really have a good idea of the
difference of speed between those good comparison sorts and qsort in
practice.

Sebastien Loisel

Guido van Rossum

unread,

Feb 17, 1997, 3:00:00 AM2/17/97

to

> Is there a particular reason why dictionaries were implemented using

> hashing tables instead of AVL trees? I've never looked at the code, but

> it appears to me that this would've been more scalable and at least as
> efficient O() wise. Perhaps in practice this is slower?

You hit the nail right on the head. ABC, a predecessor of Python,
used some form of balanced trees. The data structures were proven to
be asymptotically optimal for a variety of operations. Unfortunately
the constant factor was large. The implementation was also complex
and hard to get bug-free. Since Python was supposed to be a
one-person project, I chose for a light-weight implementation. With
proper tuning, hashing is not that bad, either!

--Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

unread,

Feb 18, 1997, 3:00:00 AM2/18/97

to

> [Sebastien Loisel opines ...]

> I surmise that Python uses qsort then, for sorting lists.

In all released versions of Python, yes, it uses libc's qsort. Suspect the
next version will also use a quicksort, but a faster custom-written one
independent of libc (and thus of various vendors' qsort bugs ...).

> Anyone actually knows what's the (supposedly constant factor)
> difference between a good qsort implementation and a good merge
> sort implementation (or any of your favorite "optimal" comparison sort).

Yes, but it's irrelevant <wink>. A good merge sort can be a bit faster than a
good quicksort on average (and blows qsort away in worst cases), but every
variant of merge sort I know of requires additional memory proportional to the
number of elements in the list (for link fields and/or temp storage). Link
fields don't come for free in Python, because its lists aren't linked; they're
contiguous vectors of pointers, and need to be (by the end) sorted in place.
Allocating extra temp memory for a sort is very unattractive, and that makes
quicksort or heapsort the natural candidates.

Something for the ambitious to ponder: Python must use a comparison sort
internally, and while comparisons are very expensive (at least an external
call for each), swapping elements is very cheap (just need to swap a pair of
pointers). This makes "merge insertion" potentially attractive (a delicate
dovetailing of merging and binary insertion that appears to be close to
optimal for worst-case # of comparisons; see Knuth section 5.3.1). The
challenge is to implement that puppy in a way that doesn't bury its promise
under overhead! I.e., it's darned hard to program this algorithm, & doubly
darned hard to do so efficiently.

non-obscene-ly y'rs - tim

Andrew P. Mullhaupt

unread,

Feb 18, 1997, 3:00:00 AM2/18/97

to

On the topic of sorts for Python, it might be well to bear in mind the
history of IBM's APL implementations. In at least two cases, a delivered
implementation of the sort primitives in APL (upgrade/downgrade) had to
be _re-implemented_ at the behest of customer pressure to satisfy the
following criteria:

1. Sorts should be stable.

2. The sort should not perform very, very, badly when there are lots of
duplicate elements in the list.

Now the Python community is not the APL community, but the APL customer
in question was a large investment bank which needed to do lots of
pretty generic database and rapid application development type stuff.

Later,
Andrew Mullhaupt

Sebastien Loisel

unread,

Feb 18, 1997, 3:00:00 AM2/18/97

to

Andrew P. Mullhaupt wrote:
> 1. Sorts should be stable.
>
> 2. The sort should not perform very, very, badly when there are lots of
> duplicate elements in the list.

Okay, then, considering that and what Tim suggested, wouldn't heap sort be a sort of
choice? (I had suggested merge because, in my folly, I imagined that lists were
implemented as linked lists - what was I thinking?!?) Yes I know you can randomize qsort,
but if heapsort typically runs as fast as qsort (which remains to be proven), why not use
something a tad more safe on speed. I know, maybe you can make the odds of the algorithm
running slowly smaller than the ratio between one over aleph-naught, but still... :)

Sebastien Loisel

Andrew P. Mullhaupt

unread,

Feb 18, 1997, 3:00:00 AM2/18/97

to

Sebastien Loisel wrote:
>
> Andrew P. Mullhaupt wrote:
> > 1. Sorts should be stable.
> >
> > 2. The sort should not perform very, very, badly when there are lots of
> > duplicate elements in the list.
>
> Okay, then, considering that and what Tim suggested, wouldn't heap sort be a sort of
> choice?

If you have a stable heapsort, it could make sense, but probably not.
Heapsort is normally significantly slower than quicksort or mergesort.

Quicksort can be a bit of a pain when it comes to stability, and
mergesort usually is criticized for the use of extra memory. Of course,
in the current context, this is probably a much less despicable property
since we're talking about copying references.

I'd think a variant of your original suggestion (mergesort) could make a
lot of sense. But I suspect what will be chosen is a variant of
quicksort.

Later,
Andrew Mullhaupt

Tim Peters

unread,

Feb 19, 1997, 3:00:00 AM2/19/97

to

>> [andrew, on sorting requirements in apl]
>> ...

>> 1. Sorts should be stable.
>>
>> 2. The sort should not perform very, very, badly when there are
>> lots of duplicate elements in the list.

> [sebastien]

> Okay, then, considering that and what Tim suggested, wouldn't heap
> sort be a sort of choice?

Heapsort has good worst-case performance, but it's no more stable (Andrew's
#1) than quicksort, and runs substantially slower than a quicksort on average.
Some vendors' libc qsorts do horribly on Andrew's #2, but the custom
quicksort I suspect will be in the next Python release does very well on it
(e.g., in the all-duplicate case, it's linear time; the already-sorted and
already-reverse-sorted cases (other common problem areas for some libc qsorts
out there) are also handled well).

Sorting is not a "one size fits all" kind of problem (anyone else remember
sort generators?!), and I think it's appropriate for Python's built-in version
to minimize the expected cost, while advertising that the built-in version
isn't suitable for all applications (like real-time). At base, my
applications couldn't care less about the rare worst cases, or stability, so I
would resent paying on every sort for avoiding someone else's problem; and the
reason my personal viewpoint triumphs here is because it champions the status
quo <0.9 wink>.

BTW, the usual trick to ensure stability when using an unstable algorithm is
to stuff an index number into each record and use the index field as a
secondary sort key; in Python, this is usually done by temporarily embedding
each original record in a tuple pair whose second element is the index; then
any sorting algorithm is fooled into stability.

> (I had suggested merge because, in my folly, I imagined that lists
> were implemented as linked lists - what was I thinking?!?)

Probably just left-over trauma from an unfortunate encounter with an inferior
language in your youth <wink>.

>Yes I know you can randomize qsort,

But the worst case remains quadratic even then; it's not a cure, so
randomization is unlikely to calm anyone with a fear of worst cases. To the
contrary, with randomization the worst cases can't be characterized in
advance, so there's no way to avoid them.

> but if heapsort typically runs as fast as qsort (which remains to be
proven),
> why not use something a tad more safe on speed.

If the "if" held, I would agree; but the "quick" in "quicksort" got there
because it's inherently ... well, quick!

> I know, maybe you can make the odds of the algorithm running slowly
> smaller than the ratio between one over aleph-naught, but still... :)

Ah! The version in the next release will run slowly with odds of just one in
aleph-one.

measure-theory-in-practice-ignores-a-few-picky-fine-points<wink>-ly y'rs -
tim

Ka-Ping Yee

unread,

Feb 19, 1997, 3:00:00 AM2/19/97

to

Sebastien Loisel wrote:
> Insofar, whenever I had to sort something, I either used Radix sort (for
> linear time) when I could (eg, C ints or floats), qsort when I needed a
> comparison sort, using the C implementation, or merge sort when I had to
> code my own comparison sort

I'm pretty partial to heapsort, myself -- non-recursive, low storage,
low time complexity, not hard to implement...

Ping

Balaji Srinivasa

unread,

Feb 19, 1997, 3:00:00 AM2/19/97

to

Andreas Jung wrote:
>
> lower() and upper() don't take care on german umlauts like "ÄÖÜüöä".
> I tried to patch string.py however with little success. Anyone out here
> with a working solution ?
>
> Thanks in advance
> Andreas
not only those, but how about a "scharfes-S" which when upcased gives
you
"SS" (note the size change!). At work we are going through a massive
i18n exercise, so I can help anyone doing the actual work ;-).

-B-
--
__________________________________________________________________________
Balaji Srinivasa - Platinvm Tech - POEMS Development -
bal...@platinum.com

Andrew P. Mullhaupt

unread,

Feb 19, 1997, 3:00:00 AM2/19/97

to

Tim Peters wrote:
>
>
> At base, my
> applications couldn't care less about the rare worst cases, or stability, so I
> would resent paying on every sort for avoiding someone else's problem;

And they resent getting a sort which solves someone else's problem but
not _theirs_. And vice versa.

> the reason my personal viewpoint triumphs here is because it champions the status
> quo <0.9 wink>.

Probably true.

> BTW, the usual trick to ensure stability when using an unstable algorithm is
> to stuff an index number into each record and use the index field as a
> secondary sort key; in Python, this is usually done by temporarily embedding
> each original record in a tuple pair whose second element is the index; then
> any sorting algorithm is fooled into stability.

This _does_ work. But it results in lots of nearly-identical but not
identical code, and this, given time, creates more problems than it
solves.

There is a _great_ value in having the 'default' sort be a quite
generally acceptable sort.

> >Yes I know you can randomize qsort,
>
> But the worst case remains quadratic even then;

If a sort is bad for arrays with many repeated values then randomizing
helps not at all.

> measure-theory-in-practice-ignores-a-few-picky-fine-points<wink>-ly y'rs

This can only result from a default measure theory which is not quite
generally acceptable, such as Riemann integration?

Later,
Andrew Mullhaupt

David Combs

unread,

Feb 20, 1997, 3:00:00 AM2/20/97

to

In article <330B11...@ix.netcom.com>,
Andrew P. Mullhaupt <amul...@ix.netcom.com> wrote:
<snip>

>This _does_ work. But it results in lots of nearly-identical but not
>identical code, and this, given time, creates more problems than it
>solves.
>

Please explain. Thanks.

David Combs

unread,

Feb 20, 1997, 3:00:00 AM2/20/97

to

Isn't the amortized splay-tree (priority queue)
a pretty good way to do a stable sort? (Link fields,
yes).

Andrew P. Mullhaupt

unread,

Feb 20, 1997, 3:00:00 AM2/20/97

to

Whenever you have code that can't even come close to reuse, that's
_bad_. But when you have code that you can _almost_ but not quite reuse,
that's worse. There is a tendency, especially in rapid development
environments, (where there is some incentive/pressure to develop
quickly), to solve a problem that one piece of code can _almost_ solve,
but not without incompatible changes, by copying that piece of code and
then applying a minor hack. Then you have two pieces of near-identical
but not identical code.

You are now exposed to some really bad problems, such as when looking at
one source, being cued by the look of that source to think as if the
assumptions which are true in the _other_ source as being true, when
they may not be. This can really hurt.

Another bad problem is when the code is modified slightly be several
people in succession until it is very hard to tell whether all the code
is really 'active' or whether some is vestigial, and so in the rapid
development process the code is left in (for fear of breaking it) which
increases the complexity.

Then there is the ever present problem of revision shift between various
entities (code and documentation to pick one real sore spot).

Now in the case at hand, the suggestion is to add a little piece of code
around the sort function to stabilize it, most likely by adding extra
order information to the keys. Of course, if the keys are _references_
(as is very likely in Python) then there may be several slightly
different twists involved in doing this. The concern is that if you get
the twist _wrong_, the effect of the bug will be downstream (ouch!) and
worse yet, when you come to the scene of the crime, it all looks too
familiar for you to easily see the trouble.

Later,
Andrew Mullhaupt

Andrew P. Mullhaupt

unread,

Feb 21, 1997, 3:00:00 AM2/21/97

to

Tim Peters wrote:
>
> > [andrew continues the fight for a stable sort]
>
> OK, here's a stable Python sort, with the same O() complexity as the built-in
> sort. For any case in which
> list.sort()
> makes sense today, you can write
> stable_sort(list)
> instead and get a stable sort. And for any case in which
> list.sort(some_comparison_function)
> makes sense today, you can write
> stable_sort(list, some_comparison_function)
> and get a stable sort. In all cases the list is sorted "in place", internal
> ref counts are the same before & after, object identities are preserved, you
> don't have to alter your data in any way, ... it's exactly like the built-in
> sort method would work were it stable, except that stable_sort(list, None)
> does not raise an exception while list.sort(None) does.
>
> def stable_sort(list, f=None):
> tagged_list = map(None, list, range(len(list)) )
> if f is None:
> tagged_list.sort()
> else:
> tagged_list.sort( lambda x,y,f=f:
> f(x[0],y[0]) or cmp(x[1],y[1]) )
> list[:] = map( lambda pair: pair[0], tagged_list )
>
> Now if the built-in sort were stable, what could be done to get back the nice
> properties of the current built-in? The choice doesn't look neutral to me,
> and I still believe Python makes the right tradeoff for its users.
>
> > ...
> > Unless calling the comparison function is not O(1) in which case something
> > is desperately wrong....
>
> Just noting that comparison of Python dicts is not O(1) (and comparison of
> dicts containing dicts ... can get real interesting <wink>); comparison of
> Python lists & tuples may take time proportional to their lengths (depending
> on the length of their common prefix); and so on.

It might make sense to talk about comparison of Python dicts not being
O(1) but it is probably much less sensible to talk about _calling that
comparison function_ as not being O(1) (Python passes references,
right?).

You are also (perhaps intentionally ?) changing the issue. When we refer
to a sort as being O(n log n), the n refers to the number of items to be
sorted, not to the size or complexity of the individual items. When you
say that comparison of Python dicts is not O(1) without noting the
change in the meaning of the O symbol, one might be forgiven for
thinking that you are claiming that comparing two Python dicts takes
_longer_ if there are n-2 other Python dicts not being compared, when it
is much more likely that you mean that the time it takes to compare two
Python dicts may depend strongly on the contents of the dicts.

For example, one can note that comparing two long integers may not be
possible in O(1) without further information about the distribution of
the long integers. However it is still meaningful to talk about a sort
of long integers which is O(n log n) as compared to one which is O(n^2),
and the usual circumlocution is to say that you can sort the integers in
O(n log n) exchanges, comparisons, or some other operation, even knowing
full well that these operations may take different times for different
instances of operands.

> > and-if-you-think-i'm-obstinate-wait-'til-you-meet-guido<grin>-ly y'rs - tim

Actually, I have found the Python community astonishingly tolerant as
language designers and developers go. Of course, aside from the
Pythonistas, fully half of the language designers I have had dealings
with prefer flow of control by computing the label of the next statement
to execute compared to structured flow of control, another one produced
an implementation of an array language in which determining the shape of
a multidimensional array requires accessing every element in the array,
and, well, you get the idea.

Later,
Andrew Mullhaupt

Tim Peters

unread,

Feb 22, 1997, 3:00:00 AM2/22/97

to

> [stuff i couldn't follow]
> ...
> There is a certain suggestion of circularity in expressing the
> view that since people use the default sort only for what it's good
> for, that there shouldn't be anything else it is good for.

Touche! But there's no circularity in pointing out that nobody has complained
about the lack of stability before, and that still the closest we've come to
it is your prophecy that someday somebody might <wink -- but playing fair is
so dull>.

Seriously, I have no objection to making the built-in sort stable, provided
that (a) it doesn't impose additional costs on the vast majority of sorts
(which don't care!), and (b) I don't have to write the code. Nothing can
remove my objection #b, but #a would go away even if a stable sort were more
expensive (in time and/or memory) if it were optional (say, accessed via a new
stable_sort method, or another optional argument to the current sort method).

and-if-you-think-i'm-obstinate-wait-'til-you-meet-guido<grin>-ly y'rs - tim

Tim Peters tim...@msn.com, t...@dragonsys.com

Tim Peters

unread,

Feb 22, 1997, 3:00:00 AM2/22/97

to

>>> [andrew]

>>> ...
>>> Unless calling the comparison function is not O(1) in which case

>>> something s desperately wrong....

>> [tim]
>> Just noting that comparison of Python dicts is not O(1) ...

> [andrew

> It might make sense to talk about comparison of Python dicts not
> being O(1) but it is probably much less sensible to talk about _calling
> that comparison function_ as not being O(1)

Ah! I misunderstood the meaning of "calling the comparison function" as
measuring the time until the function returned, not as merely the time to
transfer control to the function (as in "how long does it take to call
sin(1e300)?" "well, if the argument reduction is done correctly, probably
longer than it takes to call sin(0.0)" ...). On third thought, I agree your
intended meaning should have been clear to me from context.

> (Python passes references, right?).

Right, calls in that sense are always O(1).

> You are also (perhaps intentionally ?) changing the issue.

Na, my only issue is that I don't want to write a stable sort in C for Python
<0.9 wink>.

> [good stuff i agree with & can't add to deleted]
> ...

> Actually, I have found the Python community astonishingly tolerant
> as language designers and developers go.

Comes with the territory, Andrew: Python is an interpreted language, named
after a British comedy troupe, uses indentation to denote block structure, and
is rigidly controlled by an autocratic Dutchman: if we weren't astonishingly
tolerant to begin with, we wouldn't have wound up here <grin>.

> Of course, aside from the Pythonistas, fully half of the language designers
I
> have had dealings with prefer flow of control by computing the label of the
> next statement to execute compared to structured flow of control, another
> one produced an implementation of an array language in which determining
> the shape of a multidimensional array requires accessing every element in
> the array, and, well, you get the idea.

Oh yes! Far too many language designers fall in love with a clever idea
("let's say everything's an array!", "let's say everything's a string!",
"let's say everything's a list!", "pointers *and* arrays *and* strings? -- na,
let's say they're all the same thing!", "let's say everything's a
closure!",...) and thereby gain a measure of formal simplicity at the cost of
turning the expression of many algorithms into a sewer of obscure tricks. The
saving grace of Python's "let's say everything's an object!" is that raw
objects carry no semantics that need to be wormed around, and so I think it's
no coincidence that of all the languages I've used, Python is the one in which
I least often feel I need to "pull a trick" to say what I mean.

I bet you'd enjoy this: back in the early 80's, there was an implementation
of APL for Cray machines (== vector machines superbly suited to efficient
implementation of many of APL's primitives). It was written in Pascal, and
represented arrays as linked lists.

Yes indeed, most people should be shot -- and retroactively at that.

just-another-member-of-the-tolerant-python-community<wink>-ly y'rs - tim

Andrew P. Mullhaupt

unread,

Feb 22, 1997, 3:00:00 AM2/22/97

to

Tim Peters wrote:
>
> Oh yes! Far too many language designers fall in love with a clever idea
> ("let's say everything's an array!", "let's say everything's a string!",
> "let's say everything's a list!", "pointers *and* arrays *and* strings? -- na,
> let's say they're all the same thing!", "let's say everything's a
> closure!",...) and thereby gain a measure of formal simplicity at the cost of
> turning the expression of many algorithms into a sewer of obscure tricks.

Yes. My sewer runneth over.

> The
> saving grace of Python's "let's say everything's an object!" is that raw
> objects carry no semantics that need to be wormed around,

But Python has another one. "Let's say everything's a reference", and
although I am quite happy with this, it seems that unless you are
already experienced with interpreted languages, this one gets you sooner
or later.

> and so I think it's
> no coincidence that of all the languages I've used, Python is the one in which
> I least often feel I need to "pull a trick" to say what I mean.

I can easily believe that.

Later,
Andrew Mullhaupt

My Python boot is too tight.
I couldn't get it off last night.
A week went by,
and now it's July.
I finally got it off,
and my girlfriend cried:
"You got Stinkfoot!" - Zappa

Andrew P. Mullhaupt

unread,

Feb 22, 1997, 3:00:00 AM2/22/97

to

Tim Peters wrote:
>
> I bet you'd enjoy this: back in the early 80's, there was an implementation
> of APL for Cray machines (== vector machines superbly suited to efficient
> implementation of many of APL's primitives). It was written in Pascal, and
> represented arrays as linked lists.

Cool dude.

But you know, those vector machines were not really so good for APL. The
IBM 3090 vector facility seems like a _fine_ platform for APL right? But
there are some 'conveniences' in APL which can make it quite hard to
vectorize. It was not unusual when I was using a multi-processor IBM
3090 where two of the six processors were vector processors, to have to
_disable_ vector processing if it was available to ensure the same
results as if the job ran on a non-vector processor.

It turns out that there were the usual problems, such as reduction
variables getting different values based on vector length, etc., but
some of the problems were due to obscure fine points of APL. I have
a dim recollection of one where APL's 'cute' comparison tolerance
(despised by all right thinking numerical analysts) was the culprit.

One real problem with parallelizing 'embarassingly parallel' problems
such as an APL interpreter is, as one must now expect, the _exception
handling_ specifications of the language. You can see this one reflected
in the agony over the interaction between IEEE exceptions and
processors which can support multiple simultaneous floating point
operations (pipelined, superscalar, vector, parallel, you name it...).

By the way. Have people thought about whether Python's 'try' affects the
future possibilities for parallelizing Python?

Later,
Andrew Mullhaupt

Tim Peters

unread,

Feb 26, 1997, 3:00:00 AM2/26/97

to

>> [tim]

>> The saving grace of Python's "let's say everything's an object!" is
>> that raw objects carry no semantics that need to be wormed around,

> [andrew]

> But Python has another one. "Let's say everything's a reference",
> and although I am quite happy with this, it seems that unless you
> are already experienced with interpreted languages, this one gets
> you sooner or later.

This is an odd one, because it doesn't seem to come up for people learning
Python as a first language. Hand them a pencil, and tell them that whether
they *call* it a "number 2 yellow" or a "graphite-based writing instrument",
it's still the same damn pencil, and Python's admirably uniform treatment of
names and objects doesn't often confuse them again. I spent too many years
implementing languages to have any feel left for what's "natural" anymore, but
what I've *seen* is that raw beginners pick up the full subtleties of Python's
semantics in this area much faster than they pick up, say, the full subtleties
of C's or Fortran's. But I've also seen that users experienced in one
approach have real trouble at first adjusting to others (& I certainly wasn't
immune! I first bumped into "everything's a reference" in Icon, & struggled
mightily before realizing how utterly simple it was <0.5 wink>).

Ah, well. The one thing in Python that burns *everyone* eventually is the
scoping rules for nested functions & classes. My usual advice there is "so
don't do that" <0.1 wink>.

nobody's-gonna-get-rich-writing-a-"python-traps-&-pitfalls"-book-ly y'rs -

Christian Tismer

unread,

Mar 7, 1997, 3:00:00 AM3/7/97

to

Hi Python enthusiasts,

After some delay, I finally got to it and started the promised German=20
speaking Python mailing list right now. Ken, would you like to give
us a pointer?

I hope some people might be interested to put their locally related
stuff there, without the need to use the main list.

I append the intro text here (from now on it is in a language called
"German" or so) and can only say

* HAVE FUN * VIEL SPASS *

ciao - chris

###########################################################
#
# Willkommen in der deutschsprachigen Python-Liste.
#
# 07.03.97 - chris
#
###########################################################

"""
Diese Liste soll ein weiteres Forum f=FCr Python-Anh=E4nger sein,
die vielleicht nicht immer _nur_ englisch schreiben m=F6chten.
Manche Fragen lassen sich so leichter darstellen, und es gibt
sicher lokale Aspekte, die die Leser der Hauptliste nur dann
interessieren, wenn sie aus deutschsprachigem Gebiet kommen.
Wenn diese dann englische schreiben m=FCssen, ist es etwas albern.

Ferner meinen wir, es kann Einsteigern die Sache erleichtern,=20
wenn sie nur _eine_ Sprachbarriere =FCberwinden m=FCssen.

Die Liste ist unmoderiert und mit Petidomo 1.3 realisiert.
In K=FCrze hoffe ich auf eine reine Python-Version umstellen
zu k=F6nnen, und es wird einen eigenen Rechner nur f=FCr Python
im Internet geben, aber das ist eine andere Geschichte...

In der Hoffnung, da=DF sich hier ein reges Gemurmel einstellt
verbleibe ich mit freundlichen Gr=FC=DFen

Christian Tismer
_______________________
"""

note =3D """
Bitte speichere diesen Text irgendwo ab, falls Du irgendwann die
Liste verlassen wollen solltest.

Zum Abonnieren sende man eine Mail mit dem Zeile
subscribe
als Inhalt an <mailto:python-d...@solar.skyport.net>

Zum Abbestellen sende man eine Mail mit der Zeile
unsubscribe
als Inhalt an <mailto:python-d...@solar.skyport.net>

F=FCr eine etwas ausf=FChrlichere Hilfe sende man
help
als Inhalt an <mailto:python-d...@solar.skyport.net>
"""

print note # :-)

----------------------------------------------------------------------
Deutschsprachige Python-Liste <pyth...@solar.skyport.net>
*** nicht mehr lange, und das Python STARSHIP hebt ab ***
Got a real operating system? No? Try at least a real language: Python
<http://www.python.org> & <http://www.python.org/ftp/python/pythonwin>
----------------------------------------------------------------------

Reimer Behrends

unread,

Mar 16, 1997, 3:00:00 AM3/16/97

to

Hmm. I'm a bit late in replying to Tim's post, but as it involves a
common misconception (is quicksort really quick?), I thought that it
couldn't hurt to still post a reply.

Tim Peters (tim...@msn.com) wrote:
[...]
: Yes, but it's irrelevant <wink>. A good merge sort can be a bit faster than a

: good quicksort on average (and blows qsort away in worst cases),

Actually, quicksort is probably one of the worst misnomers in the
history of computer science. The name arose from benchmarks operating on
integers, where the fast inner loop hid the fact that quicksort requires
about twice as many comparisons as mergesort in the average case (and
the worst case for mergesort needs about as many comparisons as the best
case for quicksort, IIRC). Given that comparisons in python at least
involve a C function call (and often the execution of some python code),
the fast inner loop of quicksort doesn't help at all.

The fastest quicksort implementation (in terms of number of comparisons)
I know of is described in: Jon L. Bentley, M. Douglas McIlroy,
"Engineering a Sort Function", Software--Practice and Experience,
23(11), 1249-1265, Nov. 1993. And it still uses about 15% more
comparisons than mergesort in the average case and almost twice as many
for already sorted or nearly sorted arrays. (As an aside, note that
you can even speed up the number of comparisons needed for mergesort
to order O(n) for (nearly) sorted arrays with a simple trick). And
quicksort isn't stable, which often is a very desirable property.

: but every

: variant of merge sort I know of requires additional memory proportional to the
: number of elements in the list (for link fields and/or temp storage).

This isn't correct, either. There are quite a few implementations of
merging that require only O(sqrt(n)) additional memory while still
being stable (and if you don't need it to be stable, you can get down
to O(1)). See for instance: S. Dvorak, B. Durian, "Stable Linear Time
Sublinear Space Merging", Computer Journal, 30 (4), 372-375, 1987,
which has a very fast implementation (and it even gives Pascal code, so
implementing it should be very easy). Combine this with standard merging
for blocks of size < 256 or so, and you can still blow away quicksort
while requiring very little memory (like 4K for an array with 1,000,000
elements on a machine with a wordsize of 32 bits).

: Link

: fields don't come for free in Python, because its lists aren't linked; they're
: contiguous vectors of pointers, and need to be (by the end) sorted in place.
: Allocating extra temp memory for a sort is very unattractive, and that makes
: quicksort or heapsort the natural candidates.

More precisely, standard quicksort _does_ need O(log(n)) additional
memory. Of course, it's (1) usually allocated on the stack, (2) a
negligible amount, and (3) can even be done without (there is an O(1)
space quicksort by Knuth at the cost of somewhat slower performance, I
think, but I don't have the reference handy).

On a related matter: Do other people think too that it would be handy if
sort() could be given a number instead of a function, so that l.sort(n)
would be equivalent to l.sort(lambda a,b: return cmp(a[n], b[n]))?
Often you just want to sort by a certain key which is some element of a
tuple of list, and not having the additional overhead of function calls
would be nice. In addition, if sort() were stable, this could be used
to efficiently sort by two or more keys: l.sort(secondary_key), then
l.sort(primary_key).

[...]

Reimer Behrends

Tim Peters

unread,

Mar 18, 1997, 3:00:00 AM3/18/97

to

My offer remains open: if anyone cares to supply working, portable, tested,
faster, non-mallocing and acceptable-to-Guido C for any internal Python
sort-contiguous-array-of-object-pointers-in-place algorithm of their choice, I
won't stop them <wink>. To the contrary, I'll buy 'em dinner ...

holding-his-breath-so-long-he-looks-like-a-blueberry-ly y'rs - tim

Tim Peters tim...@msn.com, t...@dragonsys.com
not speaking for Dragon Systems Inc.

PS: Agree 100% with Reimer that the only thing that "counts" in Python's sort
is the # of comparisons, so keep the bubblesorts to yourselves <snort>.