6 views

Skip to first unread message

Jan 14, 2002, 7:17:55 PM1/14/02

to

What is the idiomatic way in Python of implementing the following

operations on a list of elements:

- return the element at the front of the list, without changing the list

- remove the element from the front of the list

- add a new element to the rear of the list

operations on a list of elements:

- return the element at the front of the list, without changing the list

- remove the element from the front of the list

- add a new element to the rear of the list

My first choice would be L[0], del L[0], and L.append(), however it

seems from some brief experiments that the del operation takes time linear

in the length of the list, so this would be very inefficient for long

lists. I also tried replacing the del with L=L[1:] but that wasn't any

better. A third possibility would be to have to variables L and i, and

implement the three operations with L[i], i=i+1, and L.append(), which is

fast but not good in space usage when the total number of operations is

much larger than the length of the list.

Is there a simple Pythonic way of implementing these three operations in

constant (amortized) time per operation and space linear in the length of

the list?

The application this came up for me is a merge sort example for an

algorithms class. So, I don't want to confuse the students with additional

code that has nothing to do with the actual algorithm, otherwise it would

be easy to do linked lists, or to do the third possibility above plus

slicing L down to size when i grows too large, or something like that.

--

David Eppstein UC Irvine Dept. of Information & Computer Science

epps...@ics.uci.edu http://www.ics.uci.edu/~eppstein/

Jan 14, 2002, 10:00:10 PM1/14/02

to

> What is the idiomatic way in Python of implementing the following

> operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

> [...]> operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

> The application this came up for me is a merge sort example for an

> algorithms class. So, I don't want to confuse the students with

> additional code that has nothing to do with the actual algorithm,

Your analysis of the situation is correct, and there's no idiomatic

Python way around it, unfortunately.

The given operations are usually "quick enough", for small lists

(less than, say, 5000 items) since the constant coefficient is

small, compared to the ordinary semi-inefficiency of running

Python code in the first place.

For merge sort, you could probably

L = []

... populate L using L.append() ...

L.reverse()

... drain L using L.pop() ...

This way you're O(N) instead of O(N^2). But it might baffle the

students.

## Jason Orendorff http://www.jorendorff.com/

Jan 14, 2002, 9:49:17 PM1/14/02

to

[David Eppstein]

> What is the idiomatic way in Python of implementing the following

> operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

> My first choice would be L[0], del L[0], and L.append(),

> What is the idiomatic way in Python of implementing the following

> operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

> My first choice would be L[0], del L[0], and L.append(),

Yes, that's "the usual". L[-1], L.pop() and L.insert(0, item) (looking at

it "from the other end") is a less usual combo.

> however it seems from some brief experiments that the del operation

> takes time linear in the length of the list,

Yes, all the elements in L[1:] are physically moved left a slot.

> so this would be very inefficient for long lists.

Sure.

> I also tried replacing the del with L=L[1:] but that wasn't any

> better.

That makes a physical copy of the slice.

> A third possibility would be to have to variables L and i, and

> implement the three operations with L[i], i=i+1, and L.append(), which is

> fast but not good in space usage when the total number of operations is

> much larger than the length of the list.

>

> Is there a simple Pythonic way of implementing these three operations in

> constant (amortized) time per operation and space linear in the length of

> the list?

You can wrap your approch with index vrbls into a smarter class.

> The application this came up for me is a merge sort example for an

> algorithms class.

IME mergesort runs fastest if implemented in a more straightforward way,

using a continguous vector of the same length as the input vector for temp

storage, and swapping the roles of "real vector" and "temp vector" on

succeeding iterations. Then there's no dynamic allocation beyond getting a

temp vector once at the start, and a fixed handful of auxiliary index

variables suffices to keep track of where you are.

> The application this came up for me is a merge sort example for an

> algorithms class. So, I don't want to confuse the students with

> additional ode that has nothing to do with the actual algorithm,

> otherwise it would be easy to do linked lists, or to do the third

> possibility above plus slicing L down to size when i grows too large, or

> something like that.

It seems that your only objection to your original idea was that it's "very

inefficient for long lists". Why? That is, if you're only trying to get

across the idea of the algorithm, speed doesn't seem relevant ("assume these

primitives take constant time; then here's the interesting result: ...").

If you're actually worried about efficiency implications of details, I'd

expect you'd be keen to teach about those too.

BTW, is it necessary to destory the input lists? Rather than remove the

element at the front of a list, most people <wink> would just increment an

index vrbl. Then all mutation occurs only via .append() on the output list,

and your efficiency worry goes away.

saving-an-index-vrbl-isn't-worth-much-thought-ly y'rs - tim

Jan 14, 2002, 11:19:05 PM1/14/02

to

In article <mailman.1011063021...@python.org>,

"Tim Peters" <tim...@home.com> wrote:

"Tim Peters" <tim...@home.com> wrote:

> It seems that your only objection to your original idea was that it's "very

> inefficient for long lists". Why? That is, if you're only trying to get

> across the idea of the algorithm, speed doesn't seem relevant ("assume these

> primitives take constant time; then here's the interesting result: ...").

> If you're actually worried about efficiency implications of details, I'd

> expect you'd be keen to teach about those too.

It's an algorithms design and analysis class. If I'm going to tell the

students that some algorithm takes O(n log n) time, it would be helpful if

the code I show them actually takes O(n log n) instead of something slower

that one could make efficient only with more detailed knowledge of Python's

implementation details.

I realize implementation speed is not the primary concern of Python

language designers, but for Python programmers I think Python's relatively

slow speeds make it especially important to be careful about asymptotics:

there's less slack for using a bad algorithm and letting today's gigahertz

computers hide your inefficiencies.

> BTW, is it necessary to destory the input lists? Rather than remove the

> element at the front of a list, most people <wink> would just increment an

> index vrbl. Then all mutation occurs only via .append() on the output list,

> and your efficiency worry goes away.

>

> saving-an-index-vrbl-isn't-worth-much-thought-ly y'rs - tim

List destruction is not necessary. The index variable approach is

probably best in this example, since merge sort doesn't need the full

power of a queue (each list has all its appends before it is scanned). I

was just hoping that Python's powerful list processing primitives would

let me simplify the code by avoiding those extra index variables. And a

queue is such a basic data structure that I thought surely Python already

allowed it to be implemented efficiently with its built-in list

operations.

Actually, the merge step in merge sort is a great example for simple

generators, or would be if not for the difficulty of keeping track of the

first item of each input iterator:

def merge(A,B):

try:

Afirst=A.next()

Aempty=0

except StopIteration:

Aempty=1

try:

Bfirst=B.next()

Bempty=0

except StopIteration:

Bempty=1

while not Aempty and not Bempty:

if Afirst < Bfirst:

yield Afirst

try:

Afirst=A.next()

except StopIteration:

Aempty=1

else:

yield Bfirst

try:

Bfirst=B.next()

except StopIteration:

Bempty=1

if not Aempty:

yield Afirst

while 1:

yield A.next()

if not Bempty:

yield Bfirst

while 1:

yield B.next()

def mergesort(L):

if len(L) < 2:

return iter(L)

else:

return merge(mergesort(L[:len(L)//2]),

mergesort(L[len(L)//2:]))

This (like the closely related heap sort) has the advantage that if you

start sorting n items, but only end up looking at the first k items of

sorted output, it takes time O(n + k log n), so the time is nonlinear only

when k is large. But this is way too much code for a blackboard, and the

students are confused enough by slices...

Jan 14, 2002, 11:31:27 PM1/14/02

to

In article <mailman.1011063861...@python.org>,

"Jason Orendorff" <ja...@jorendorff.com> wrote:

"Jason Orendorff" <ja...@jorendorff.com> wrote:

> The given operations are usually "quick enough", for small lists

> (less than, say, 5000 items) since the constant coefficient is

> small, compared to the ordinary semi-inefficiency of running

> Python code in the first place.

I'm not sure what you mean by quick enough, but when I tried draining a

list by repeated del L[0], on lists of 10000 elements it took roughly a

second -- more time than I'd like to spend in most interactive situations.

My philosophy to writing general purpose code like a sorting routine is to

use as efficient algorithms as reasonably possible, because even if you

think you are only going to see 5000-element lists and that 1-second delays

are acceptable, someone later may well try it on a million-element list and

get disgusted when the program takes hours to run. I don't mind using

Python compared to faster languages like C, for programs where fast runtime

is not as critical, if the slowdown is limited to constant factor like 10

or 100. But I wouldn't make me happy if the same "it's quick enough"

attitude led to programs with gratuitously inefficient algorithms that

don't scale.

Jan 15, 2002, 12:07:20 AM1/15/02

to

> From: David Eppstein [mailto:epps...@ics.uci.edu]

>

> In article <mailman.1011063861...@python.org>,

> "Jason Orendorff" <ja...@jorendorff.com> wrote:

>

> > The given operations are usually "quick enough", for small lists

> > (less than, say, 5000 items) since the constant coefficient is

> > small, compared to the ordinary semi-inefficiency of running

> > Python code in the first place.

>

>

> In article <mailman.1011063861...@python.org>,

> "Jason Orendorff" <ja...@jorendorff.com> wrote:

>

> > The given operations are usually "quick enough", for small lists

> > (less than, say, 5000 items) since the constant coefficient is

> > small, compared to the ordinary semi-inefficiency of running

> > Python code in the first place.

>

> My philosophy to writing general purpose code like a sorting

> routine is to

> use as efficient algorithms as reasonably possible, because

> even if you

> think you are only going to see 5000-element lists and that

> 1-second delays

> are acceptable, someone later may well try it on a

> million-element list and

> get disgusted when the program takes hours to run. I don't

> mind using

> routine is to

> use as efficient algorithms as reasonably possible, because

> even if you

> think you are only going to see 5000-element lists and that

> 1-second delays

> are acceptable, someone later may well try it on a

> million-element list and

> get disgusted when the program takes hours to run. I don't

> mind using

This very much falls under "knowing the data structures you will be using"

or "knowing which data structures your language uses.", as well as knowing

your audience.

Students are notorious for taking an example and trying to apply it in all

cases.

The approach that should therefore be taken (for teaching an algorithms

class) would be along the lines of ...

"I will use the simplest algorithm possible in Python to demonstrate the

principles of the algorithm. The algorithm itself is O(n^2). For the data

sets you will be using in the examples and assignments, you can consider

this to be the case.

However, it is important to also consider the data structure that the

algorithm is being used on - in particular in this case, whether the data

structure is optimised for random access, access at the head of the

structure, the tail of the structure, or some other type of access. These

factors may well increase the running time of a particular algorithm

many-fold.

For example, you would expect that increasing the size of the dataset from

10 to 20 would increase the time required approximately 4-fold for this

algorithm. However, because of the data structure that is used, the increase

is actually approximately 16-fold. So don't use this exact implementation of

the algorithm on millions of elements (laugh)."

... thus making some of the students aware that you cannot simply say "this

is the best algorithm in all cases".

Tim Delaney

Jan 15, 2002, 1:41:44 AM1/15/02

to

How's this (untested; uses nested scopes). It uses extra temp space

the size of the input, but runs in the correct order time.

the size of the input, but runs in the correct order time.

def merge(a,b):

ia, ib = 0, 0 # indices

def next():

# peel next element off a or b.

m = 0 # selector between a and b

if ia < len(a) and ib < len(b):

m = a[ia] < b[ib]

else:

m = ia >= len(a)

if m == 0:

x, ia = a[ia], ia+1

else:

# if both a and b are exhausted, this throws IndexError.

x, ib = b[ib], ib+1

return x

result = []

try:

while 1:

result.append(next())

except IndexError:

return result

Jan 15, 2002, 2:04:59 AM1/15/02

to

In article <7xpu4cg...@ruckus.brouhaha.com>,

Paul Rubin <phr-n...@nightsong.com> wrote:

Paul Rubin <phr-n...@nightsong.com> wrote:

> How's this (untested; uses nested scopes). It uses extra temp space

> the size of the input, but runs in the correct order time.

Sure, that looks like the straightforward way of doing it using index

variables.

Jan 15, 2002, 2:36:06 AM1/15/02

to

[Tim]

>> It seems that your only objection to your original idea was

>> that it's "very inefficient for long lists". Why? That is, if

>> you're only trying to get across the idea of the algorithm, speed

>> doesn't seem relevant ("assume these primitives take constant time;

>> then here's the interesting result: ...").

>> If you're actually worried about efficiency implications of details, I'd

>> expect you'd be keen to teach about those too.

>> It seems that your only objection to your original idea was

>> that it's "very inefficient for long lists". Why? That is, if

>> you're only trying to get across the idea of the algorithm, speed

>> doesn't seem relevant ("assume these primitives take constant time;

>> then here's the interesting result: ...").

>> If you're actually worried about efficiency implications of details, I'd

>> expect you'd be keen to teach about those too.

[David Eppstein]

> It's an algorithms design and analysis class. If I'm going to tell the

> students that some algorithm takes O(n log n) time, it would be

> helpful if the code I show them actually takes O(n log n) instead of

> something slower that one could make efficient only with more detailed

> knowledge of Python's implementation details.

Well, I suggested two ways to do that, in Python today. If Python *did*

have a list type "efficient at both ends", I don't think you'd be doing your

students a favor by having to add "and, by the way, this implementation is

only efficient in Python -- in any other language it would crawl, due to

implementation details we won't cover here".

BTW, Icon has an "efficient at both ends" list, but indexing isn't

constant-time there (it uses a doubly-linked list of buckets under the

covers -- or did last time I was familiar with its implementation).

> I realize implementation speed is not the primary concern of Python

> language designers,

Nothing is <wink>. But you misread this case: efficiency of *typical* list

operations is so important that we'd be loathe to slow those down in an

effort to cater to unusual operations. That's why Guido tossed ABC's list

implementation (using AVL trees under the covers) in favor of contiguous

vectors to begin with. "Almost all" programs ran much faster under Python

than they did under ABC as a result.

> but for Python programmers I think Python's relatively slow speeds make

> it especially important to be careful about asymptotics: there's less

> slack for using a bad algorithm and letting today's gigahertz

> computers hide your inefficiencies.

We've moved heaven and earth just to make appending an item at a time

efficient in extreme cases across 50 uncooperative platform malloc()

implementations; I expect you underestimate the difficulty of coding fast

portable data structures (not asymptotics, but the infinitely detailed pain

of avoiding pitfalls in real platform libraries and compilers). Note that I

wouldn't be *opposed* to making Python's list an efficent deque in all

cases, but I don't know how to do it without slowing typical uses.

> ...

> List destruction is not necessary. The index variable approach is

> probably best in this example, since merge sort doesn't need the full

> power of a queue (each list has all its appends before it is scanned).

Even better (according to me), your students would learn an algorithm that's

efficient even if written in something other than Python.

> I was just hoping that Python's powerful list processing primitives would

> let me simplify the code by avoiding those extra index variables. And a

> queue is such a basic data structure that I thought surely Python already

> allowed it to be implemented efficiently with its built-in list

> operations.

A plain list works fine for most uses of queues; they have to get pretty big

before memcpy() is a burden. For big queues, one person (Courageous?) did

code up a more-efficient extension type, but there is no "popular" extension

for this; that suggests demand is low.

> Actually, the merge step in merge sort is a great example for simple

> generators, or would be if not for the difficulty of keeping track of the

> first item of each input iterator:

This is a case where unbounded streams are much easier to live with (see the

"merge" function in test_generators.py -- not having to worry about

end-of-stream allows to eliminate most of the code, and all uses of

StopIteration).

I appreciate the abstract loveliness of this, but it's hard to see under all

the syntactic hair. A lazy functional language (like Haskell) is much more

pleasant for writing algorithms in this style.

Jan 15, 2002, 3:22:46 AM1/15/02

to

"David Eppstein" <epps...@ics.uci.edu> wrote in message

news:eppstein-C4AE42...@news.service.uci.edu...

...

> Actually, the merge step in merge sort is a great example for simple

> generators, or would be if not for the difficulty of keeping track of the

> first item of each input iterator:

news:eppstein-C4AE42...@news.service.uci.edu...

...

> Actually, the merge step in merge sort is a great example for simple

> generators, or would be if not for the difficulty of keeping track of the

> first item of each input iterator:

So, why not encapsulate that "difficulty"? E.g.:

class Peeker:

def __init__(self, iterator_or_sequence):

self.iterator = iter(iterator_or_sequence)

self.head = None

self.advance()

def advance(self):

head = self.head

try: self.head = self.iterator.next()

except StopIteration: self.more = 0

else: self.more = 1

return head

With an ADT such as this one, operations such as merging get simpler:

def merge(A, B):

A=Peeker(A)

B=Peeker(B)

while A.more and B.more:

if A.head < B.head: yield A.advance()

else: yield B.advance()

while A.more: yield A.advance()

while B.more: yield B.advance()

Note that the sum total of wrapper class Peeker and this merge version

is less than or equal to the merge version that tries to do everything,

in terms of both lines of code and "conceptual load". Basically, this

is because Peeker "captures" the right "primitive concepts" (attributes

more and head, operation advance) for merge's purposes, and therefore

eliminates duplication present in the "stand-alone version of merge".

One might start with the large, code-duplicating version of merge, and

work towards this one, as a way to motivate ADT's, I guess. But I'm

not sure such a bottom-up-and-refactor approach is didactically more

valid than top-down-and-refine, although it's closer to how one might

work to develop Peeker in real life.

Alex

Jan 15, 2002, 3:46:51 AM1/15/02

to

"David Eppstein" <epps...@ics.uci.edu> wrote in message

news:eppstein-27FE15...@news.service.uci.edu...> What is the idiomatic way in Python of implementing the following

> operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

> My first choice would be L[0], del L[0], and L.append(), however it

> seems from some brief experiments that the del operation takes time linear

> in the length of the list, so this would be very inefficient for long

> lists. I also tried replacing the del with L=L[1:] but that wasn't any

> better. A third possibility would be to have to variables L and i, and

> implement the three operations with L[i], i=i+1, and L.append(), which is

> fast but not good in space usage when the total number of operations is

> much larger than the length of the list.

>

> Is there a simple Pythonic way of implementing these three operations in

> constant (amortized) time per operation and space linear in the length of

> the list?

What about (warning, untested and particularly un-profiled code):

class fifo:

def __init__(self):

self.data = []

self.first = 0

def head(self):

return self.data[self.first]

def pop(self):

self.first += 1

if self.first > len(self.data)/2:

self.data = self.data[self.first:]

self.first = 0

def append(self, value):

self.data.append(value)

I think this should be constant-amortized-time, and never take more

space than K+2*N where N is the number of items in the list at a

given time. It may be possible to find sequences of pop and append

operations that degrade per-operation time beyond amortized constant.

Such anomalies, in similar cases, are generally ameliorated by some

further guard against too much churning in the data structure, e.g.

if self.first>self.Konstant and self.first>len(self.data)/2:

as the guard-clause for the self.data rebinding -- any constant

value self.Konstant will not matter asymptotically in terms of

space taken, of course (it goes into the K of "K+2*N").

Alex

Jan 15, 2002, 5:04:22 AM1/15/02

to

"Alex Martelli" <al...@aleax.it> writes:

> I think this should be constant-amortized-time,

> I think this should be constant-amortized-time,

I think logarithmetic amortized time, i.e. calling pop N times takes

O(N log N) time (because it has to do a linear-time operation every so often).

Jan 15, 2002, 7:38:49 AM1/15/02

to

"Paul Rubin" <phr-n...@nightsong.com> wrote in message

news:7xpu4cg...@ruckus.brouhaha.com...

news:7xpu4cg...@ruckus.brouhaha.com...

FWIW, here's my crack at the same thing:

def merge(left, right):

ans = []

try:

while 1:

ans.append( (left[-1] > right[-1] and left or right ).pop() )

except IndexError:

ans.reverse()

return left + right + ans

def msort(ds):

if len(ds) <= 1: return ds

mid = len(ds) // 2

return merge( ms(ds[:mid]), ms(ds[mid:]) )

Raymond Hettinger

Jan 15, 2002, 8:08:47 AM1/15/02

to

"Alex Martelli" <al...@aleax.it> wrote in message

news:a20q9p$ej5$1...@serv1.iunet.it...

> class fifo:

> def __init__(self):

> self.data = []

> self.first = 0

> def head(self):

> return self.data[self.first]

> def pop(self):

> self.first += 1

> if self.first > len(self.data)/2:

> self.data = self.data[self.first:]

> self.first = 0

> def append(self, value):

> self.data.append(value)

>

>

> Alex

>

The Python Cookbook has an algorithm for O(n) FIFO queues submitted by

Sébastien Keim.

His approach is interesting and brilliant,

See: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/68436

After tightening the code a bit, it looks a like this:

class Fifo:

stored = [None]

stored[0] = stored

def append(self,data): # to inside

if len(self.stored) == 1:

return self.stored.append(data)

self.stored[0] = self.stored = [self.stored[0], data]

def pop(self): # from outside

whole = self.stored[0]

self.stored[0] = whole[0]

return whole[1]

Raymond Hettinger

Jan 15, 2002, 9:04:51 AM1/15/02

to

"Tim Peters" <tim...@home.com> wrote in message

news:mailman.1011063021...@python.org...

[snipped]

> BTW, is it necessary to destory the input lists? Rather than remove the

> element at the front of a list, most people <wink> would just increment an

> index vrbl. Then all mutation occurs only via .append() on the output

list,

> and your efficiency worry goes away.

news:mailman.1011063021...@python.org...

[snipped]

> BTW, is it necessary to destory the input lists? Rather than remove the

> element at the front of a list, most people <wink> would just increment an

> index vrbl. Then all mutation occurs only via .append() on the output

list,

> and your efficiency worry goes away.

I hate to beat this one to death, but it affords a great opportunity to

show-off Python2.2's iterators and sub-classing of built-in types.

class Queue(list):

def __init__(self, *args):

list.__init__(self, *args)

self.g = iter(self) # the index is implicit in the iterator

def pop(self):

return self.g.next()

q = Queue()

q.append(3); q.append(4)

print q.pop()

q.append(5)

print q.pop(),

print q.pop()

r = Queue([6,7,8])

print r.pop()

print r.pop()

Raymond Hettinger

Jan 15, 2002, 9:23:53 AM1/15/02

to

In article [...], David Eppstein wrote:

>What is the idiomatic way in Python of implementing the following

>operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

>My first choice would be L[0], del L[0], and L.append(), however it

>seems from some brief experiments that the del operation takes time linear

>in the length of the list, so this would be very inefficient for long

>lists.

[snip]>What is the idiomatic way in Python of implementing the following

>operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

>

>My first choice would be L[0], del L[0], and L.append(), however it

>seems from some brief experiments that the del operation takes time linear

>in the length of the list, so this would be very inefficient for long

>lists.

>Is there a simple Pythonic way of implementing these three operations in

>constant (amortized) time per operation and space linear in the length of

>the list?

The simplest one I can think of is the trusty old linked list...

>The application this came up for me is a merge sort example for an

>algorithms class. So, I don't want to confuse the students with additional

>code that has nothing to do with the actual algorithm, otherwise it would

>be easy to do linked lists,

Hm. Well, it doesn't take *that* much code...

class Node:

def __init__(self, obj):

self.value = obj

dummy = Node(None)

dummy.next = dummy

class Queue:

def __init__(self):

self.head = self.tail = dummy

def put(self, obj):

self.tail.next = Node(obj)

self.tail = self.tail.next

def peek(self): return self.head.value

def get(self):

result = self.head

self.head = self.head.next

return result.value

I'm sure you can simplify it even further without much trouble, using

lists instead of node objects etc... Oh, well. If you don't want it

you don't want it.

BTW: The speedup is quite dramatic:

Problem size: 10

List version: 0.000274896621704

Queue version: 0.000545978546143

Problem size: 100

List version: 0.00122201442719

Queue version: 0.00465798377991

Problem size: 1000

List version: 0.0175099372864

Queue version: 0.0498030185699

Problem size: 10000

List version: 0.939693927765

Queue version: 0.538775920868

Problem size: 100000

List version: 91.8680340052

Queue version: 5.83884394169

(Repeated "problem size" times: Add integer, peek integer, remove

integer. The time is from time.time())

Sadly, the standard library object Queue also uses a list in its

implementation, so there is no help there ;)

--

Magnus Lie Hetland The Anygui Project

http://hetland.org http://anygui.org

Jan 15, 2002, 9:21:52 AM1/15/02

to

"Paul Rubin" <phr-n...@nightsong.com> wrote in message

news:7xr8oso...@ruckus.brouhaha.com...The "linear-time operation" in question is copying 1/2 of the current

list's contents, though. Assume a sequence of N appends followed by

N pops: then we copy N/2 + N/4 + N/8 + ... = about N items in all, no?

Thus, O(N) times for N pops -> constant *amortized* time per pop.

I'm not totally confident about what happens for other ways to interleave

the appends and pops, admittedly.

Alex

Jan 15, 2002, 11:05:05 AM1/15/02

to

In article <7xr8oso...@ruckus.brouhaha.com>,

Paul Rubin <phr-n...@nightsong.com> wrote:

Paul Rubin <phr-n...@nightsong.com> wrote:

No, Alex was right, it is constant amortized time.

Each linear time operation (reducing the list length)

can be charged against the insertion operations that added the items being

removed from the list.

Jan 15, 2002, 11:23:41 AM1/15/02

to

"David Eppstein" <epps...@ics.uci.edu> wrote in message

news:eppstein-B55AA6...@news.service.uci.edu...> In article <7xr8oso...@ruckus.brouhaha.com>,

> Paul Rubin <phr-n...@nightsong.com> wrote:

>

> > "Alex Martelli" <al...@aleax.it> writes:

> > > I think this should be constant-amortized-time,

> >

> > I think logarithmetic amortized time, i.e. calling pop N times takes

> > O(N log N) time (because it has to do a linear-time operation every so

often).

>

> No, Alex was right, it is constant amortized time.

> Each linear time operation (reducing the list length)

> can be charged against the insertion operations that added the items being

> removed from the list.

Hmmm, nice way to frame things, and it does come close to convincing

me that it's indeed constant amortized time for any interleaving of

append and pop operations.

Alex

Jan 15, 2002, 11:16:16 AM1/15/02

to

David Eppstein wrote:

> In article <mailman.1011063861...@python.org>,

> "Jason Orendorff" <ja...@jorendorff.com> wrote:

>

> > The given operations are usually "quick enough", for small lists

> > (less than, say, 5000 items) since the constant coefficient is

> > small, compared to the ordinary semi-inefficiency of running

> > Python code in the first place.

>

> I'm not sure what you mean by quick enough, but when I tried draining a

> list by repeated del L[0], on lists of 10000 elements it took roughly a

> second -- more time than I'd like to spend in most interactive situations.

>

> My philosophy to writing general purpose code like a sorting

> routine is to use as efficient algorithms as reasonably possible,

> [...deletia...]> In article <mailman.1011063861...@python.org>,

> "Jason Orendorff" <ja...@jorendorff.com> wrote:

>

> > The given operations are usually "quick enough", for small lists

> > (less than, say, 5000 items) since the constant coefficient is

> > small, compared to the ordinary semi-inefficiency of running

> > Python code in the first place.

>

> I'm not sure what you mean by quick enough, but when I tried draining a

> list by repeated del L[0], on lists of 10000 elements it took roughly a

> second -- more time than I'd like to spend in most interactive situations.

>

> My philosophy to writing general purpose code like a sorting

> routine is to use as efficient algorithms as reasonably possible,

All you had to say was, "5000 items won't do it for me. I want

genuine O(N log N) behavior."

In which case you could have read on to the part of my email where

I suggested L.reverse().

Cheers.

Jan 15, 2002, 12:24:12 PM1/15/02

to

In article <mailman.1011111645...@python.org>,

"Jason Orendorff" <ja...@jorendorff.com> wrote:

"Jason Orendorff" <ja...@jorendorff.com> wrote:

> > I'm not sure what you mean by quick enough, but when I tried draining a

> > list by repeated del L[0], on lists of 10000 elements it took roughly a

> > second -- more time than I'd like to spend in most interactive situations.

>

> All you had to say was, "5000 items won't do it for me. I want

> genuine O(N log N) behavior."

I didn't say it that way because I think it's important to understand why

good asymptotics are important, and under what circumstances it would be ok

to use slower algorithms, rather than just blindly using the math.

Anyway, thanks for all the solutions posted so far.

The reverse trick is looking cleanest for merge sort, although Alex M.'s

queue implementation looks good (better I think than the more obvious

linked list version) and I also like the ADT-ized generator version.

Jan 15, 2002, 3:37:16 PM1/15/02

to

David Eppstein wrote:

>Actually, the merge step in merge sort is a great example for simple

>generators, or would be if not for the difficulty of keeping track of the

>first item of each input iterator:

Tim Peters wrote:

>I appreciate the abstract loveliness of this, but it's hard to see under

all

>the syntactic hair. A lazy functional language (like Haskell) is much

more

>pleasant for writing algorithms in this style.

Here's an approach using generators that gets rid of some of the

"try:except StopIteration" clutter.

Jim

=======================================

from __future__ import generators

def gen(alist):

result = {'isempty':1,'value':None} # or you could use a list instead

for value in alist:

result['isempty'] = 0

result['value'] = value

yield result

result['isempty'] = 1

while 1:

yield result

def merge(list1,list2):

gen1, gen2 = gen(list1), gen(list2)

next1, next2 = gen1.next(), gen2.next()

resultList = []

while 1:

if next1['isempty'] and next2['isempty']:

break

if (next2['isempty'] or

(not next1['isempty'] and next1['value'] <= next2['value'])):

resultList.append(next1['value'])

next1 = gen1.next()

else:

resultList.append(next2['value'])

next2 = gen2.next()

return resultList

def mergesort(alist):

if len(alist) <= 1:

return alist[:]

return merge(mergesort(alist[:len(alist)//2]),

mergesort(alist[len(alist)//2:]))

Jan 15, 2002, 4:19:50 PM1/15/02

to

In article <mailman.101112710...@python.org>,

James_...@i2.com wrote:

James_...@i2.com wrote:

> Here's an approach using generators that gets rid of some of the

> "try:except StopIteration" clutter.

Ok, but your merge routine gets rid of a key advantage of using generators:

it returns a list, rather than yielding the items one at a time, so the

total time is always Theta(n log n), while for the version I posted (and

the cleaner ADT-ized version someone else posted later) the merges all

happen in parallel as items are requested, and the time to pull off the

first k items from a sorted list of n items is O(n + k log n).

Jan 15, 2002, 4:46:09 PM1/15/02

to

On Tue, 15 Jan 2002 09:46:51 +0100, Alex Martelli <al...@aleax.it> wrote:

>

>What about (warning, untested and particularly un-profiled code):

>

>class fifo:

> def __init__(self):

> self.data = []

> self.first = 0

> def head(self):

> return self.data[self.first]

> def pop(self):

> self.first += 1

> if self.first > len(self.data)/2:

> self.data = self.data[self.first:]

> self.first = 0

> def append(self, value):

> self.data.append(value)

>

>I think this should be constant-amortized-time, and never take more

>space than K+2*N where N is the number of items in the list at a

>given time. It may be possible to find sequences of pop and append

>operations that degrade per-operation time beyond amortized constant.

>

>What about (warning, untested and particularly un-profiled code):

>

>class fifo:

> def __init__(self):

> self.data = []

> self.first = 0

> def head(self):

> return self.data[self.first]

> def pop(self):

> self.first += 1

> if self.first > len(self.data)/2:

> self.data = self.data[self.first:]

> self.first = 0

> def append(self, value):

> self.data.append(value)

>

>I think this should be constant-amortized-time, and never take more

>space than K+2*N where N is the number of items in the list at a

>given time. It may be possible to find sequences of pop and append

>operations that degrade per-operation time beyond amortized constant.

This can be extended to a fancier scheme, where you can save quite some

append and del if you also keep an attribute self.last and let the fifo wrap

around on the list. It is simple to implement in Python due to the way

negative indices are treated. In the following diagram * represents element

in the queue and . represents dummy element:

...... first ******** last ......

**** last .......first *********

When self.last catches up on self.first, the list size is enlarged, say, by

a factor of 0.2. It is reduced by a factor when the queue falls under a

certain proportion of the list. These factors represent the trade-off

between space and speed. This scheme is more useful when there are lots of

appned and pop but the queue length remains fairly stable, in which case the

cost of list changing operations approach zero-amortized time.

Huaiyu

Jan 15, 2002, 4:58:52 PM1/15/02

to

David Eppstein wrote:

>Ok, but your merge routine gets rid of a key advantage of using

generators:

>it returns a list, rather than yielding the items one at a time, so the

>total time is always Theta(n log n), while for the version I posted (and

>the cleaner ADT-ized version someone else posted later) the merges all

>happen in parallel as items are requested, and the time to pull off the

>first k items from a sorted list of n items is O(n + k log n).

Ok. Here is essentially the same thing but we return an interator instead

of a list (same idea as your version).

BTW, I also like Alex's ADT approach. Mine was just a suggestion if class

defs were to be avoided.

Jim

==========================

from __future__ import generators

def gen(alist):

result = {'isempty':1,'value':None}

for value in alist:

result['isempty'] = 0

result['value'] = value

yield result

result['isempty'] = 1

while 1:

yield result

def merge(list1,list2):

gen1, gen2 = gen(list1), gen(list2)

next1, next2 = gen1.next(), gen2.next()

while 1:

if next1['isempty'] and next2['isempty']:

break

if (next2['isempty'] or

(not next1['isempty'] and next1['value'] <= next2['value'])):

yield next1['value']

next1 = gen1.next()

else:

yield next2['value']

next2 = gen2.next()

def mergesort(alist):

if len(alist) <= 1:

return iter(alist)

Jan 15, 2002, 5:55:33 PM1/15/02

to

[James_...@i2.com]

> Speaking of generator abuse ...

>

> Anything like this:

>

> def mergesort(alist):

> if len(alist) <= 1:

> return iter(alist)

> return merge(mergesort(alist[:len(alist)//2]),

> mergesort(alist[len(alist)//2:]))

>

> is probably going to create way too many generator/iterator instances to

> scale well.

> Speaking of generator abuse ...

>

> Anything like this:

>

> def mergesort(alist):

> if len(alist) <= 1:

> return iter(alist)

> return merge(mergesort(alist[:len(alist)//2]),

> mergesort(alist[len(alist)//2:]))

>

> scale well.

Well, it was clear from the start that there's no interest in a practical

algorithm here, else we'd just use a temp array and a handful of index

variables. "Theoretical efficiency" (O() where a multiplier of 10000000 is

as good as 1) is an interesting area of study, full of papers with schemes

so convoluted the authors cheerfully confess to not even trying to implement

them.

"computer-science"-isn't-about-programming-computers-but-python-is-ly

y'rs - tim

Jan 15, 2002, 5:34:48 PM1/15/02

to

Speaking of generator abuse ...

Anything like this:

def mergesort(alist):

if len(alist) <= 1:

return iter(alist)

return merge(mergesort(alist[:len(alist)//2]),

mergesort(alist[len(alist)//2:]))

is probably going to create way too many generator/iterator instances to

scale well.

Jim

Jan 15, 2002, 7:48:47 PM1/15/02

to

In article <mailman.101113540...@python.org>,

"Tim Peters" <tim...@home.com> wrote:

"Tim Peters" <tim...@home.com> wrote:

> > is probably going to create way too many generator/iterator instances to

> > scale well.

>

> Well, it was clear from the start that there's no interest in a practical

> algorithm here, else we'd just use a temp array and a handful of index

> variables.

I think heap sort would be a good choice for practicality, since it sorts

in-place with only a handful of index variables. But except for unusual

circumstances, I would likely just call Python's built-in sort().

Jan 15, 2002, 10:20:27 PM1/15/02

to

In article <eppstein-27FE15...@news.service.uci.edu>,

David Eppstein <epps...@ics.uci.edu> wrote:

>

>What is the idiomatic way in Python of implementing the following

>operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

David Eppstein <epps...@ics.uci.edu> wrote:

>

>What is the idiomatic way in Python of implementing the following

>operations on a list of elements:

> - return the element at the front of the list, without changing the list

> - remove the element from the front of the list

> - add a new element to the rear of the list

Just to be a jerk, why not use a dict with a couple of index variables?

--

--- Aahz <*> (Copyright 2002 by aa...@pobox.com)

Hugs and backrubs -- I break Rule 6 http://www.rahul.net/aahz/

Androgynous poly kinky vanilla queer het Pythonista

"There are times when effort is important and necessary, but this should

not be taken as any kind of moral imperative." --jdecker

Jan 15, 2002, 10:21:09 PM1/15/02

to

In article <a20osj$78u$1...@serv1.iunet.it>, Alex Martelli <al...@aleax.it> wrote:

>

> [...]

>

> [...]

Um, what's an ADT?

Jan 15, 2002, 11:36:11 PM1/15/02

to

Aahz Maruch wrote:

>Um, what's an ADT?

>Um, what's an ADT?

"abstract data type"

It's a data structure with associated methods (wasn't called

class nor object 20+ years ago) which can be used unchanged

in many different codes.

ADTS include lists, dicts, binary trees, priority queues, stack,

quad and oct trees. Upon reflection, I seem to associate an

ADT with storage containers.

Here's the FOLDOC defintition from

http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?ADT

> (ADT) A type whose internal form is hidden behind a set of access

> functions. Objects of the type are created and inspected only by

> calls to the access functions. This allows the implementation of

> the type to be changed without requiring any changes outside the

> module in which it is defined.

> Abstract data types are central to object-oriented programming

> where every class is an ADT.

But I wouldn't regard, say, the classes in rfy833.py as ADTs.

> A classic example of an ADT is a stack data type for which functions

> might be provided to create an empty stack, to push values onto a

> stack and to pop values from a stack.

Andrew

da...@dalkescientific.com

Jan 15, 2002, 11:43:49 PM1/15/02

to

In article <a22vts$5jg$1...@slb4.atl.mindspring.net>,

Andrew Dalke <da...@dalkescientific.com> wrote:

>Aahz Maruch wrote:

>>

>>Um, what's an ADT?

>

>"abstract data type"

>

>It's a data structure with associated methods (wasn't called

>class nor object 20+ years ago) which can be used unchanged

>in many different codes.

>

>ADTS include lists, dicts, binary trees, priority queues, stack,

>quad and oct trees. Upon reflection, I seem to associate an

>ADT with storage containers.

Andrew Dalke <da...@dalkescientific.com> wrote:

>Aahz Maruch wrote:

>>

>>Um, what's an ADT?

>

>"abstract data type"

>

>It's a data structure with associated methods (wasn't called

>class nor object 20+ years ago) which can be used unchanged

>in many different codes.

>

>ADTS include lists, dicts, binary trees, priority queues, stack,

>quad and oct trees. Upon reflection, I seem to associate an

>ADT with storage containers.

Oh, right. I knew that, I just didn't know that I knew that. ;-)

(More precisely, I didn't remember that particular acronym.)

Jan 16, 2002, 12:32:06 AM1/16/02

to

In article <a22rhr$bn1$1...@panix3.panix.com>, aa...@panix.com (Aahz Maruch)

wrote:

wrote:

> Just to be a jerk, why not use a dict with a couple of index variables?

Maybe you could unpack that a little? dicts are useful but AFAIK they

aren't the same thing as queues, index variables or no.

Jan 16, 2002, 3:26:33 AM1/16/02

to

"Aahz Maruch" <aa...@panix.com> wrote in message

news:a22rj5$bs6$1...@panix3.panix.com...

> In article <a20osj$78u$1...@serv1.iunet.it>, Alex Martelli <al...@aleax.it>

wrote:

> >

> > [...]

>

> Um, what's an ADT?

news:a22rj5$bs6$1...@panix3.panix.com...

> In article <a20osj$78u$1...@serv1.iunet.it>, Alex Martelli <al...@aleax.it>

wrote:

> >

> > [...]

>

> Um, what's an ADT?

Sorry -- I should not have used a three-letter-acronym (TLA) without

expanding it at least once, no matter how well-known I could assume

it to be to the specific poster I was responding to.

ADT = "Abstract Data Type". When teaching algorithms and data

structures, one approach is to "implement" an algorithm in terms

of abstract operations on some underlying data type, each operation

defined in terms of its just-as-abstract semantics and performance

characteristics. The issue of *implementing* an ADT in concrete

terms is then neatly separated from that of *using* it. As far as

I know the original idea came from abstract algebra, where one

does much the same with abstract structures -- one separates the

issue of "what can I prove about a monoid" (the latter being

abstractly defined in terms of its axioms) from the issue of

proving whether a given algebraic structure is indeed a monoid.

Of course, ADT's also mesh well with the concept of separation

between interface and implementation as used in actual software

design. Performance characteristics are rarely specified in

the latter case, but there are important exceptions (for example,

C++'s Standard Library does have O(N) performance specs for

various operations as part of its overall specifications).

Alex

Jan 16, 2002, 3:11:34 AM1/16/02

to

"David Eppstein" <epps...@ics.uci.edu> wrote in message

news:eppstein-3B0408...@news.service.uci.edu...> In article <a22rhr$bn1$1...@panix3.panix.com>, aa...@panix.com (Aahz Maruch)

> wrote:

>

> > Just to be a jerk, why not use a dict with a couple of index variables?

>

> Maybe you could unpack that a little? dicts are useful but AFAIK they

> aren't the same thing as queues, index variables or no.

Well, they obviously CAN be used to IMPLEMENT queues -- performance

characteristics apart:

class DictFifo:

def __init__(self):

self.data = {}

self.nextin = 0

self.nextout = 0

def append(self, value):

self.data[self.nextin] = value

self.nextin += 1

def pop(self):

value = self.data[self.nextout]

del self.data[self.nextout]

self.nextout += 1

No "being the same" asserted or implied, of course -- the dict and

the two indices are just tools to _implement_ a fifo queue, here.

Alex

Jan 16, 2002, 7:07:29 AM1/16/02

to

Yup. Note that for Python < 2.2, you should use:

def __init__(self):

self.data = {}

self.nextin = 0L

self.nextout = 0L

to make sure that you can use it for more than two billion entries. ;-)

I haven't tested this, but I think that if you expect queues to be

large, this might actually be a fast, clear implementation.

Jan 16, 2002, 11:41:56 AM1/16/02

to

Andrew Dalke wrote:

>

> Aahz Maruch wrote:

> >Um, what's an ADT?

>

> "abstract data type"

>

> It's a data structure with associated methods (wasn't called

> class nor object 20+ years ago) which can be used unchanged

> in many different codes.

>

> Aahz Maruch wrote:

> >Um, what's an ADT?

>

> "abstract data type"

>

> It's a data structure with associated methods (wasn't called

> class nor object 20+ years ago) which can be used unchanged

> in many different codes.

As Andrew hinted, a big distinction between OO classes

and ADTs is that the latter can't be subclassed.

ADT-based programming is useful in languages which don't

directly support OOP (e.g ANSI C). They don't provide subclassing

or protocol/interface-based programming, but ADTs do

simplify encapsulation and memory management.

My first exposure to ADTs came when I started using

Modula-2. Hence the claim that if you remove memory-management

(i.e. instantiation) from ADT-based programming you get

module-based programming :)

--

...and lots of singletons

Mitch Chapman

Mitch....@bioreason.com

Jan 16, 2002, 2:11:41 PM1/16/02

to

news:a23cjl$dps$1...@serv1.iunet.it...

> class DictFifo:

> def __init__(self):

> self.data = {}

> self.nextin = 0

> self.nextout = 0

> def append(self, value):

> self.data[self.nextin] = value

> self.nextin += 1

> def pop(self):

> value = self.data[self.nextout]

> del self.data[self.nextout]

> self.nextout += 1

> class DictFifo:

> def __init__(self):

> self.data = {}

> self.nextin = 0

> self.nextout = 0

> def append(self, value):

> self.data[self.nextin] = value

> self.nextin += 1

> def pop(self):

> value = self.data[self.nextout]

> del self.data[self.nextout]

> self.nextout += 1

This clearly indicates the point of abstract data types separating

inplementation from interface. In the real world, there are at least

two ways of implementing FIFO (first-in,first-out) service discipline.

One is to keep people in a physical line (sequence). Another is to

'label' people in their order of entry -- for instance by giving them

or having them take a sequenced ticket -- and call them in order

while letting them physically mill around before getting called.

DictInfo above, while counter-intuitive to me at first, nicely mirrors

the second method, with sequential counts being the numbered

'tickets'.

Terry J. Reedy

Jan 16, 2002, 6:04:21 PM1/16/02

to

Aha! I always thought ADT stood for "Algebraic Data Type" and assumed people

were talking about haskell / ML-ish types, although I never had a very clear

idea of what they were supposed to exactly be (value semantics?). And fp

advocates would say things like "behold the power of adts, blah blah". And

then Limbo goes and has an adt keyword in an imperative context...

Thanks for the description!

Jan 16, 2002, 6:19:47 PM1/16/02

to

> >class DictFifo:

> > def __init__(self):

> > self.data = {}

> > self.nextin = 0

> > self.nextout = 0

> > def append(self, value):

> > self.data[self.nextin] = value

> > self.nextin += 1

> > def pop(self):

> > value = self.data[self.nextout]

> > del self.data[self.nextout]

> > self.nextout += 1

return value> > def __init__(self):

> > self.data = {}

> > self.nextin = 0

> > self.nextout = 0

> > def append(self, value):

> > self.data[self.nextin] = value

> > self.nextin += 1

> > def pop(self):

> > value = self.data[self.nextout]

> > del self.data[self.nextout]

> > self.nextout += 1

:)

> def __init__(self):

> self.data = {}

> self.nextin = 0L

> self.nextout = 0L

>

> to make sure that you can use it for more than two billion

> entries. ;-)

>

> I haven't tested this, but I think that if you expect queues to be

> large, this might actually be a fast, clear implementation.

I just tested it with the following code. Everything is wrapped in classes

to maintain similar overhead. Note that for DictIntFifo, OverflowError is

caught to allow conversion to long in any version of Python. I've also

included a normal list and array for comparison.

import time

import string

import array

class DictIntFifo:

def __init__(self):

self.data = {}

self.nextin = 0

self.nextout = 0

def append(self, value):

self.data[self.nextin] = value

try:

self.nextin += 1

except OverflowError:

self.nextin += long(1)

def pop(self):

value = self.data[self.nextout]

del self.data[self.nextout]

try:

self.nextout += 1

except OverflowError:

self.nextout += long(1)

return value

class DictLongFifo:

def __init__(self):

self.data = {}

self.nextin = long(0)

self.nextout = long(0)

def append(self, value):

self.data[self.nextin] = value

self.nextin += 1

def pop(self):

value = self.data[self.nextout]

del self.data[self.nextout]

self.nextout += 1

return value

class ListPrependFifo:

""""""

def __init__(self):

""""""

self.data = []

def append(self, value):

self.data.insert(0, value)

def pop(self):

return self.data.pop()

class ListAppendFifo:

""""""

def __init__(self):

""""""

self.data = []

def append(self, value):

self.data.append(value)

def pop(self):

return self.data.pop(0)

def test (fifo, iterations):

""""""

start = time.clock()

for i in xrange(iterations):

fifo.append(i)

j = fifo.pop()

end = time.clock()

try:

name = fifo.__class__.__name__

except:

name = type(fifo)

print "%-8s %-20s %-06f" % (iterations, name, end - start,)

for i in xrange(4):

print

test([], 10**(i + 3))

test(array.array('i'), 10**(i + 3))

test(DictIntFifo(), 10**(i + 3))

test(DictLongFifo(), 10**(i + 3))

test(ListPrependFifo(), 10**(i + 3))

test(ListAppendFifo(), 10**(i + 3))

1000 <type 'list'> 0.003968

1000 <type 'array'> 0.004607

1000 DictIntFifo 0.012154

1000 DictLongFifo 0.015508

1000 ListPrependFifo 0.010499

1000 ListAppendFifo 0.010357

10000 <type 'list'> 0.039399

10000 <type 'array'> 0.046923

10000 DictIntFifo 0.120931

10000 DictLongFifo 0.156747

10000 ListPrependFifo 0.105777

10000 ListAppendFifo 0.105253

100000 <type 'list'> 0.394604

100000 <type 'array'> 0.459258

100000 DictIntFifo 1.214058

100000 DictLongFifo 1.568331

100000 ListPrependFifo 1.060692

100000 ListAppendFifo 1.046310

1000000 <type 'list'> 4.071518

1000000 <type 'array'> 4.640713

1000000 DictIntFifo 12.192316

1000000 DictLongFifo 15.778213

1000000 ListPrependFifo 10.700729

1000000 ListAppendFifo 10.636400

So, it looks like there is a definite speed penalty with the dict versions,

although the majority of that is probably namespace lookup time (more

operations). A C version would probably do quite well.

Nonetheless, it is a good example of separating interface from

implementation.

Tim Delaney

Jan 16, 2002, 8:29:17 PM1/16/02

to

In article <mailman.101122322...@python.org>,

"Delaney, Timothy" <tdel...@avaya.com> wrote:

"Delaney, Timothy" <tdel...@avaya.com> wrote:

> So, it looks like there is a definite speed penalty with the dict versions,

> although the majority of that is probably namespace lookup time (more

> operations). A C version would probably do quite well.

It's good to see actual data, but I don't think this is a fair comparison:

your queues always have at most one item in them, so you never see how well

these versions scale to different lengths. Try replacing

for i in xrange(iterations):

fifo.append(i)

j = fifo.pop()

with

for i in xrange(iterations):

fifo.append(i)

for i in xrange(iterations):

j = fifo.pop()

When I did that, I got the following results, showing that among the

Fifo's, the dictionary versions are very competitive with the method of

indexing a list and replacing the list when the index gets too big

(ListIndexFifo, code below the data), while the other list-based Fifos are

not good for long queues. (List itself is just a stack, not a queue, of

course.)

1000 list 0.007062

1000 <type 'array.array'> 0.008669

1000 DictIntFifo 0.021960

1000 DictLongFifo 0.026667

1000 ListPrependFifo 0.024463

1000 ListAppendFifo 0.026615

1000 ListIndexFifo 0.021000

10000 list 0.072249

10000 <type 'array.array'> 0.086427

10000 DictIntFifo 0.222813

10000 DictLongFifo 0.262345

10000 ListPrependFifo 0.839607

10000 ListAppendFifo 1.063594

10000 ListIndexFifo 0.202094

100000 list 0.770646

100000 <type 'array.array'> 27.156242

100000 DictIntFifo 2.345710

100000 DictLongFifo 2.797171

100000 ListPrependFifo 76.949710

100000 ListAppendFifo 91.907281

100000 ListIndexFifo 2.047624

class ListIndexFifo:

""""""

def __init__(self):

""""""

self.data = []

self.index = -1

def append(self, value):

self.data.append(value)

def pop(self):

self.index = self.index + 1

if self.index > len(self.data) / 2:

self.data = self.data[self.index:]

self.index = 0

return self.data[self.index]

Jan 16, 2002, 8:57:58 PM1/16/02

to

> From: David Eppstein [mailto:epps...@ics.uci.edu]

>

> In article <mailman.101122322...@python.org>,

> "Delaney, Timothy" <tdel...@avaya.com> wrote:

>

> > So, it looks like there is a definite speed penalty with

> the dict versions,

> > although the majority of that is probably namespace lookup

> time (more

> > operations). A C version would probably do quite well.

>

> It's good to see actual data, but I don't think this is a

> fair comparison:

> your queues always have at most one item in them, so you

> never see how well

> these versions scale to different lengths. Try replacing

>

> In article <mailman.101122322...@python.org>,

> "Delaney, Timothy" <tdel...@avaya.com> wrote:

>

> > So, it looks like there is a definite speed penalty with

> the dict versions,

> > although the majority of that is probably namespace lookup

> time (more

> > operations). A C version would probably do quite well.

>

> It's good to see actual data, but I don't think this is a

> fair comparison:

> your queues always have at most one item in them, so you

> never see how well

> these versions scale to different lengths. Try replacing

Remind me to hit myself with a rubber chicken ...

Of course I completely invalidated the point of the entire exercise ... I

should have added all, then removed all (which in my mind I was doing). And

my comparison with list/array is incorrect as well ... I needed to change

every pop() to accept a position (and ignore it :) so I could pop(0) on the

list and array.

import time

import string

import array

class DictIntFifo:

def __init__(self):

self.data = {}

self.nextin = 0

self.nextout = 0

def append(self, value):

self.data[self.nextin] = value

try:

self.nextin += 1

except OverflowError:

self.nextin += long(1)

def pop(self, pos=-1):

value = self.data[self.nextout]

del self.data[self.nextout]

try:

self.nextout += 1

except OverflowError:

self.nextout += long(1)

return value

class DictLongFifo:

def __init__(self):

self.data = {}

self.nextin = long(0)

self.nextout = long(0)

def append(self, value):

self.data[self.nextin] = value

self.nextin += 1

def pop(self, pos=-1):

value = self.data[self.nextout]

del self.data[self.nextout]

self.nextout += 1

return value

class ListPrependFifo:

""""""

def __init__(self):

""""""

self.data = []

def append(self, value):

self.data.insert(0, value)

def pop(self, pos=-1):

return self.data.pop()

class ListAppendFifo:

""""""

def __init__(self):

""""""

self.data = []

def append(self, value):

self.data.append(value)

def pop(self, pos=-1):

return self.data.pop(0)

def test (fifo, iterations):

""""""

start = time.clock()

for i in xrange(iterations):

fifo.append(i)

for i in xrange(iterations):

j = fifo.pop(0)

end = time.clock()

try:

name = fifo.__class__.__name__

except:

name = type(fifo)

print "%-8s %-20s %-06f" % (iterations, name, end - start,)

for i in xrange(3):

print

test([], 10**(i + 3))

test(array.array('i'), 10**(i + 3))

test(DictIntFifo(), 10**(i + 3))

test(DictLongFifo(), 10**(i + 3))

test(ListPrependFifo(), 10**(i + 3))

test(ListAppendFifo(), 10**(i + 3))

---------- Run ----------

1000 <type 'list'> 0.007721

1000 <type 'array'> 0.007273

1000 DictIntFifo 0.012430

1000 DictLongFifo 0.017795

1000 ListPrependFifo 0.013849

1000 ListAppendFifo 0.014851

10000 <type 'list'> 0.329070

10000 <type 'array'> 0.234557

10000 DictIntFifo 0.123809

10000 DictLongFifo 0.185284

10000 ListPrependFifo 0.329698

10000 ListAppendFifo 0.398269

100000 <type 'list'> 76.195043

100000 <type 'array'> 56.828019

100000 DictIntFifo 1.269020

100000 DictLongFifo 1.961017

100000 ListPrependFifo 60.693928

100000 ListAppendFifo 77.094517

So once you get to decently large quantities in the queue, the dict versions

make real sense over a basic list (Win2000).

Tim Delaney

Jan 16, 2002, 9:03:38 PM1/16/02

to

[James Althoff, on an "elegant" mergesort]

> is probably going to create way too many generator/iterator

> instances to scale well.

> is probably going to create way too many generator/iterator

> instances to scale well.

[Tim]

> Well, it was clear from the start that there's no interest in a

> practical algorithm here, else we'd just use a temp array and a

> handful of index variables.

[David Eppstein]

> I think heap sort would be a good choice for practicality, since it sorts

> in-place with only a handful of index variables.

Following is the spelling of mergesort I've had in mind ("a temp array and a

handful of index variables"). It's eminently practical. Note that if the

list doesn't have an exact power-of-2 size, no attempt is made to "balance"

the excess across passes. That keeps the code simple, but doesn't harm

correctness or O() behavior. Better speedups can be obtained via simpler

means than balancing (like picking on the innermost loop, and splitting off

the first (s=1) pass to swap out-of-order pairs in-place directly).

def mergesort(A):

"Sort list, via stable mergesort."

n = len(A)

B = [None] * n

swapped = 0

s = 1

while s < n:

s2 = s+s

# Invariant: viewing A as a sequence of slices of length s (plus

# perhaps an oddball at the end), each slice is sorted. (This is

# trivially so when s=1; proceed by induction.) Now merge adjacent

# pairs of slices into sorted slices twice as large.

for i in range(0, n, s2):

ihi = j = i+s # start of adjacent slice

jhi = min(j+s, n) # final slice may have an oddball size

k = i # start of output slice

while i < ihi and j < jhi:

if A[i] <= A[j]:

B[k] = A[i]

i += 1

else:

B[k] = A[j]

j += 1

k += 1

# Copy tail. At most one of these isn't vaccuous.

B[k : k+ihi-i] = A[i : ihi]

B[k : k+jhi-j] = A[j : jhi]

# Swap input and output lists.

A, B = B, A

swapped = not swapped

s = s2

if swapped: # copy back into original input list

B[:] = A

This is easier to get right for most people than a heapsort, and, if you can

afford the memory hit for a temp vector, generally runs faster (esp. for

large arrays on VM machines -- this mergesort's sequential memory access

works much better with typical disks and OS paging policies than heapsort's

leaping around). Coding is straightforward even in assembler. It doesn't

lend itself to finding "just the first K" quicker, though.

> But except for unusual circumstances, I would likely just call Python's

> built-in sort().

That's at least sane <wink>. I wrote Python's current list.sort(), and it

runs faster than the quickest mergesort I was able to code in C at the time

(which was a variant of the above, plus playing every low-level speed trick

under the sun). The quickest mergesort was a little quicker than the

quickest quicksort I tried; heapsort wasn't competitive. The sort() Python

ended up with is a nightmarish hybrid of single-thread samplesort and binary

insertion sort. That was the clear winner out of about 20 sort

implementations, and I expect it remains very hard to beat (for Python's

specific ratio of slow comparison to cheap object movement (just pointer

copies)).

life-is-ugly-where-the-theory-collides-with-the-hw<wink>-ly y'rs - tim

Jan 17, 2002, 4:28:36 AM1/17/02

to

> The Python Cookbook has an algorithm for O(n) FIFO queues submitted by

> Sébastien Keim.

> His approach is interesting and brilliant,

>

> See: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/68436

> Sébastien Keim.

> His approach is interesting and brilliant,

>

> See: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/68436

Thank's

I've completed the page with some kind of benchmarking.

The results are quite amazing but I would be more confident if someone

could confirm them. (in fact I found them very rude for built-in

lists)

Jan 17, 2002, 8:04:02 AM1/17/02

to

Delaney, Timothy" <tdel...@avaya.com> wrote in message

news:mailman.1011232764...@python.org...

news:mailman.1011232764...@python.org...

> class DictLongFifo:

>

> def __init__(self):

> self.data = {}

> self.nextin = long(0)

> self.nextout = long(0)

>

> def append(self, value):

>

> self.data[self.nextin] = value

> self.nextin += 1

>

> def pop(self, pos=-1):

> value = self.data[self.nextout]

> del self.data[self.nextout]

> self.nextout += 1

> return value

I'm not sure why, but subclassing the built-in types gives amazing

performance

benefits. Please re-try the timings with this version of DictLongFifo:

class DictLongFifo(dict):

def __init__(self):

self.nextin = 0L

self.nextout = 0L

def append(self, value):

self[self.nextin] = value

self.nextin += 1

def pop(self):

value = self[self.nextout]

del self[self.nextout]

self.nextout += 1

return value

Raymond Hettinger

Jan 17, 2002, 8:46:36 AM1/17/02

to

> 10000 <type 'list'> 0.329070

> 10000 <type 'array'> 0.234557

> 10000 DictIntFifo 0.123809

> 10000 DictLongFifo 0.185284

> 10000 ListPrependFifo 0.329698

> 10000 ListAppendFifo 0.398269

>

> 100000 <type 'list'> 76.195043

> 100000 <type 'array'> 56.828019

> 100000 DictIntFifo 1.269020

> 100000 DictLongFifo 1.961017

> 100000 ListPrependFifo 60.693928

> 100000 ListAppendFifo 77.094517

>

> So once you get to decently large quantities in the queue, the dict

versions

> make real sense over a basic list (Win2000).

>

> Tim Delaney

> 10000 <type 'array'> 0.234557

> 10000 DictIntFifo 0.123809

> 10000 DictLongFifo 0.185284

> 10000 ListPrependFifo 0.329698

> 10000 ListAppendFifo 0.398269

>

> 100000 <type 'list'> 76.195043

> 100000 <type 'array'> 56.828019

> 100000 DictIntFifo 1.269020

> 100000 DictLongFifo 1.961017

> 100000 ListPrependFifo 60.693928

> 100000 ListAppendFifo 77.094517

>

> So once you get to decently large quantities in the queue, the dict

versions

> make real sense over a basic list (Win2000).

>

> Tim Delaney

Before drawing a conclusion on this one, please add Sébastien Keim's

approach

to the mix:

class NestedListFifo:

def __init__(self):

self.first=[]

def append(self,data):

node = [data,[]]

if self.first == []:

self.first= node

else:

self.last[1] = node

self.last = node

def pop(self, n=-1):

node = self.first

self.first=node[1]

return node[0]

Raymond Hettinger

Jan 17, 2002, 9:11:09 AM1/17/02

to

qu...@hork.ugcs.caltech.edu (Quinn Dunkan) writes:

> Aha! I always thought ADT stood for "Algebraic Data Type" and assumed people

> were talking about haskell / ML-ish types, although I never had a very clear

> idea of what they were supposed to exactly be (value semantics?). And fp

> advocates would say things like "behold the power of adts, blah blah".

Your confusion might have come from the fact that some of the more

rigorous software designers out there, especially some of those that

are also fp people, describe their ADTs via algebraic specification.

Then again, it might not have.

-Justin

Jan 17, 2002, 3:28:40 PM1/17/02

to

<Tim Peters>

Following is the spelling of mergesort I've had in mind ("a temp array and

a

handful of index variables"). It's eminently practical.

Thanks for sharing this. I tried it on a list of 100,000 ints in reverse

order. It ran about 4.5x faster than the generator version (the

generator-with-list.append version was a little faster than the

all-generator version).

<Tim Peters>

I wrote Python's current list.sort(), and it

runs faster than the quickest mergesort I was able to code in C at the time

<snip>

I expect it remains very hard to beat (for Python's

specific ratio of slow comparison to cheap object movement (just pointer

copies)).

</Tim Peters>

On the list of 100,000 the builtin tim.list.sort was a mere 800x or so

faster than the generator version.

The bad news -- for us Jython users -- is that on the list of 100,000

Jython.list.sort was about 45x slower than cpython.list.sort. :-(

Jim

<Tim Peters>

</Tim Peters>

Jan 17, 2002, 6:10:43 PM1/17/02

to

> From: Raymond Hettinger [mailto:oth...@javanet.com]

> "Delaney, Timothy" <tdel...@avaya.com> wrote in message

> news:mailman.1011232764...@python.org...

> >

> > 10000 <type 'list'> 0.329070

> > 10000 <type 'array'> 0.234557

> > 10000 DictIntFifo 0.123809

> > 10000 DictLongFifo 0.185284

> > 10000 ListPrependFifo 0.329698

> > 10000 ListAppendFifo 0.398269

> >

> > 100000 <type 'list'> 76.195043

> > 100000 <type 'array'> 56.828019

> > 100000 DictIntFifo 1.269020

> > 100000 DictLongFifo 1.961017

> > 100000 ListPrependFifo 60.693928

> > 100000 ListAppendFifo 77.094517

> >

> > So once you get to decently large quantities in the queue, the dict

> versions

> > make real sense over a basic list (Win2000).

> >

> > Tim Delaney

>

>

> Before drawing a conclusion on this one, please add Sébastien Keim's

> approach

> to the mix:

> "Delaney, Timothy" <tdel...@avaya.com> wrote in message

> news:mailman.1011232764...@python.org...

> >

> > 10000 <type 'list'> 0.329070

> > 10000 <type 'array'> 0.234557

> > 10000 DictIntFifo 0.123809

> > 10000 DictLongFifo 0.185284

> > 10000 ListPrependFifo 0.329698

> > 10000 ListAppendFifo 0.398269

> >

> > 100000 <type 'list'> 76.195043

> > 100000 <type 'array'> 56.828019

> > 100000 DictIntFifo 1.269020

> > 100000 DictLongFifo 1.961017

> > 100000 ListPrependFifo 60.693928

> > 100000 ListAppendFifo 77.094517

> >

> > So once you get to decently large quantities in the queue, the dict

> versions

> > make real sense over a basic list (Win2000).

> >

> > Tim Delaney

>

>

> Before drawing a conclusion on this one, please add Sébastien Keim's

> approach

> to the mix:

No question - there's lots of ways to do a FIFO list. I was merely

interested in how well Alex's version would perform compared to the naive

method of just using a list. And then of course I buggered up the test (in

two different ways!) and felt it important to acknowledge this and correct

any misconceptions that the dict method was inefficient which were formed

based on my erroneous test.

Tim Delaney

Jan 19, 2002, 1:49:11 AM1/19/02

to

[Tim]

>> Following is the spelling of mergesort I've had in mind ("a temp array

>> and a handful of index variables"). It's eminently practical.

>> Following is the spelling of mergesort I've had in mind ("a temp array

>> and a handful of index variables"). It's eminently practical.

[James Althoff]

> Thanks for sharing this.

Wait until you get the bill <wink>.

> I tried it on a list of 100,000 ints in reverse order. It ran about

> 4.5x faster than the generator version (the generator-with-list.append

> version was a little faster than the all-generator version).

Note that it was written for clarity rather than speed; commentary in the

original post suggested two effective ways to speed it up; anyone obsessed

with speed will think of others.

[on the builtin list.sort()]

> On the list of 100,000 the builtin tim.list.sort was a mere 800x or so

> faster than the generator version.

I'm afraid that wasn't "a fair" comparison: assuming you were still using a

list of 100,000 ints "in reverse order", list.sort() manages to sort

reverse-sorted inputs in linear time, while mergesort is always (best case

and worst case) N log N. A fairer test would apply random.shuffle() to the

input first. Note that if the "split off the s=1 iteration" suggestion for

speeding the mergesort is taken, it's easy to add a bit of code to the

split-off first iteration to detect already-sorted and

already-reverse-sorted in mergesort too (count how many swaps are done in

the split-off first iteration; a result of either 0 or total-number-of-pairs

is astronomically unlikely for random input, so if either obtains and

already-sorted or already-reverse-sorted are plausible input cases, it's

worth doing (at most) another n/2 compares to finish the job in linear time

when you can).

> The bad news -- for us Jython users -- is that on the list of 100,000

> Jython.list.sort was about 45x slower than cpython.list.sort. :-(

Last I looked, Java used a quicksort. In the common median-of-3 variant of

quicksort, reverse-sorted input is one of the best cases, but it's still an

N log N case. Again it would be fairer to shuffle the input first. The

thing that hurts quicksort for use in Python is that it does significantly

more comparisons (on average) than the theoretical minimum (on average), and

comparison of Python objects is a long-winded OO adventure <wink>.

Jan 23, 2002, 5:41:57 PM1/23/02

to

David Eppstein <epps...@ics.uci.edu> writes:

> I think heap sort would be a good choice for practicality, since it sorts

> in-place with only a handful of index variables. But except for unusual

> circumstances, I would likely just call Python's built-in sort().

> I think heap sort would be a good choice for practicality, since it sorts

> in-place with only a handful of index variables. But except for unusual

> circumstances, I would likely just call Python's built-in sort().

That's good --- I was beginning to think you were delusional.

Even with hacks to make it less likely to hit its worst-case, though,

quicksort's constant factor makes it about twice as fast as mergesort.

How does heapsort's constant factor compare with mergesort's?

Python's built-in sort() uses quicksort (using a linear congruential

generator to pick pivot elements) to partition the array into small

chunks, and then uses an algorithm that looks a lot like a bubble sort

to sort the small chunks. It has special cases for already-sorted

data, all-the-same data, reverse-sorted data, and each of the above

followed by some unsorted data. This is the kind of thing you get

when you aim for practicality.

Jan 23, 2002, 7:22:56 PM1/23/02

to

In article <83it9s4...@panacea.canonical.org>,

Kragen Sitaker <kra...@pobox.com> wrote:

Kragen Sitaker <kra...@pobox.com> wrote:

> Even with hacks to make it less likely to hit its worst-case, though,

> quicksort's constant factor makes it about twice as fast as mergesort.

This is either using clever tricks to pick a good pivot, or assuming that

comparisons are fast (probably not a good assumption in Python), right?

Merge sort uses very close to the optimal number of comparisons, single

randomly-chosen pivot quicksort is off by some constant factor.

It also depends on how your data was represented; several years ago when I

needed code (in C++) to sort linked-list data I chose merge sort, since

it's simple, for that representation it does not use any extra memory, and

the lists I was sorting were small enough that I didn't have to worry about

caching effects (as I think someone pointed out earlier, array-based

versions of quick sort and merge sort can be much better than heap sort in

their caching behavior).

> How does heapsort's constant factor compare with mergesort's?

Not as good, but it uses less memory. The problem is, quicksort doesn't

use much more memory than heapsort, and is I think usually much faster, so

heapsort is not so often the best choice.

> Python's built-in sort() uses quicksort (using a linear congruential

> generator to pick pivot elements) to partition the array into small

> chunks, and then uses an algorithm that looks a lot like a bubble sort

> to sort the small chunks. It has special cases for already-sorted

> data, all-the-same data, reverse-sorted data, and each of the above

> followed by some unsorted data. This is the kind of thing you get

> when you aim for practicality.

Yes, theory and practice are not the same thing.

Sometimes (as here) the theoretical algorithms are nice and clean, but

making them practical creates a mess. Other times theoreticians use lots

of complicated tricks to reduce the worst-case complexity while the better

practical solutions are simpler...

Jan 23, 2002, 9:03:43 PM1/23/02

to

I was hoping that someone else would comment on this post and I'm counting on people to correct me where I err.

http://www.nist.gov/dads/HTML/abstractDataType.html

"Abstract Data Type" is a computer science and a math term. For that reason, I believe that the phrase 'ADT-based programming' has no meaning. What would 'module-based programming' mean?

"Object Oriented Programming" is a programming term. Myself being a wannabe computer science geek, I make a distinction between computer science and programming.

The other term that I see misused is "Data Structure".

http://www.nist.gov/dads/HTML/datastructur.html

Examples of data structures are pointers, arrays, records, and files. Examples of abstract data types are stacks, queues, graphs, and FIFOs. ADTs can be implemented in terms of data structures.

John

-----Original Message-----

From: Mitch Chapman [mailto:Mitch....@bioreason.com]

Sent: Wednesday, January 16, 2002 8:42 AM

To: pytho...@python.org

Subject: Re: algorithms and ADTs (was Re: efficient idiomatic queue?)

[quote deleted]

As Andrew hinted, a big distinction between OO classes

and ADTs is that the latter can't be subclassed.

ADT-based programming is useful in languages which don't

directly support OOP (e.g ANSI C). They don't provide subclassing

or protocol/interface-based programming, but ADTs do

simplify encapsulation and memory management.

My first exposure to ADTs came when I started using

Modula-2. Hence the claim that if you remove memory-management

(i.e. instantiation) from ADT-based programming you get

module-based programming :)

--

..and lots of singletons

Reply all

Reply to author

Forward

0 new messages