[Python-ideas] Heap data type

22 views
Skip to first unread message

Facundo Batista

unread,
Apr 18, 2009, 8:21:45 AM4/18/09
to Python-Ideas
Using the module heapq, I found that it's not easy to use. Or, at
least, that it could be straightforward to use.

My main issue is that for correct usage:

- user should provide an external list, that shouldn't use for
anything else to don't break the invariant

- to alter the heap queue, a bunch of functions must be used, passing
always the external list


I think that changing "external list" for "internal attribute", and
"bunch of functions " for "methods", it will leave the module easier
to use, safer, and more object oriented.

So, I basically coded it. What do you think about including this class
in the heap module?

"""
class Heap(object):
'''Maintains a heapified list, always conserving the invariant.

Heaps are arrays for which heap[k] <= heap[2*k+1] and
heap[k] <= heap[2*k+2] for all k, counting elements from zero.
'''

def __init__(self, iterable=None):
'''Initializes the heap from any iterable.

>>> Heap([1, 2])
Heap([1, 2])
>>> Heap([])
Heap([])
>>> Heap()
Heap([])
>>> Heap((1, 2, 3))
Heap([1, 2, 3])
>>> Heap(x**2 for x in range(3))
Heap([0, 1, 4])
'''
if iterable is None:
self._queue = []
else:
self._queue = list(iterable)
heapq.heapify(self._queue)

def __repr__(self):
return "Heap(%s)" % self._queue

def push(self, item):
'''Push the item to the heap queue.

>>> h = Heap()
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.push(2)
>>> h.push(1)
>>> h.push(0)
>>> h
Heap([0, 2, 1, 5, 3, 4])
'''
heapq.heappush(self._queue, item)

def pop(self):
'''Pop one item from the heap queue.

>>> h = Heap([0, 2, 1, 5, 3, 4])
>>> h.pop()
0
>>> h.pop()
1
>>> h.pop()
2
>>> h.pop()
3
>>> h.pop()
4
>>> h.pop()
5
>>> h
Heap([])
>>> h.pop()
Traceback (most recent call last):
...
IndexError: index out of range
'''
return heapq.heappop(self._queue)

def pushpop(self, item):
'''Push the item and pop one from the heap queue.

Note that this method is more efficient than calling both
push() and pop() separately.

>>> h = Heap()
>>> h.pushpop(7)
7
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.pushpop(7)
3
>>> h.pushpop(7)
4
>>> h
Heap([5, 7, 7])
'''
return heapq.heappushpop(self._queue, item)

def replace(self, item):
'''Pop one item and push the received one into the heap queue

Note that this method is more efficient than calling both
pop() and push() separately.

>>> h = Heap()
>>> h.replace(3)
Traceback (most recent call last):
...
IndexError: index out of range
>>> h.push(7)
>>> h.replace(2)
7
>>> h
Heap([2])
>>> h.push(3)
>>> h
Heap([2, 3])
>>> h.replace(1)
2
>>> h.replace(9)
1
>>> h
Heap([3, 9])
'''
return heapq.heapreplace(self._queue, item)
"""

Regards,

--
. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Aahz

unread,
Apr 18, 2009, 8:43:58 AM4/18/09
to python...@python.org
On Sat, Apr 18, 2009, Facundo Batista wrote:
>
> I think that changing "external list" for "internal attribute", and
> "bunch of functions " for "methods", it will leave the module easier
> to use, safer, and more object oriented.

+1 -- I recently did a short presentation on Big O notation and Python
containers, and heapq looked remarkably ugly.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur." --Red Adair

Jacob Holm

unread,
Apr 18, 2009, 9:28:00 AM4/18/09
to Facundo Batista, Python-Ideas
Facundo Batista wrote:
> Using the module heapq, I found that it's not easy to use. Or, at
> least, that it could be straightforward to use.
>
> My main issue is that for correct usage:
>
> - user should provide an external list, that shouldn't use for
> anything else to don't break the invariant
>
> - to alter the heap queue, a bunch of functions must be used, passing
> always the external list
>
>
> I think that changing "external list" for "internal attribute", and
> "bunch of functions " for "methods", it will leave the module easier
> to use, safer, and more object oriented.
>
> So, I basically coded it. What do you think about including this class
> in the heap module?
>

Your "Heap" class implements (part of) an abstract data type usually
known as a priority queue.

The fact that it uses the heapq module internally is an implementation
detail that is not (and should not be IMHO) exposed to the users of the
class. So if this is added, I think it should rather go in the
collections module.

Much better (faster for some operations) implementations of this ADT are
possible, so by not tying the implementation to heapq we would be free
to switch the implementation later if we want.


+1 to adding a collections.priority_queue, initially based on your class.

+0 to adding your class as heapq.Heap


In either case I think you should add at least the following additional
method.

def peek(self):
return self._queue[0]


- Jacob

Stefan Behnel

unread,
Apr 18, 2009, 10:01:25 AM4/18/09
to python...@python.org
Facundo Batista wrote:
> Using the module heapq, I found that it's not easy to use. Or, at
> least, that it could be straightforward to use.
> [...]

> So, I basically coded it. What do you think about including this class
> in the heap module?

I did something similar a couple of years ago (and I doubt that I'm the
only one). You should be able to find it in the bug tracker (no longer open).

Stefan

Daniel Stutzbach

unread,
Apr 18, 2009, 10:49:07 AM4/18/09
to Aahz, python...@python.org
On Sat, Apr 18, 2009 at 7:43 AM, Aahz <aa...@pythoncraft.com> wrote:
+1 -- I recently did a short presentation on Big O notation and Python
containers, and heapq looked remarkably ugly.

Another issue with heapq is that it does not support modifying the priority of an existing element (usually called the "decrease_key" operation in textbooks).  The hard part of incrementing decrease_key is that somehow the object in the heap needs to know its current position, else you have to do an expensive linear search to find the object's current position.

I've implemented a couple of heaps for Python over the years:

1. http://svn.python.org/view/sandbox/trunk/collections/pairing_heap.py?view=markup&pathrev=40887
(checked into the Sandbox... 5 years ago! I think I also have a C reimplementation somewhere)

If you don't care about altering the priority, then the interface is more or less like Facundo described.

To keep track of the positions, insert operations return a wrapper around each object.  The user needs to keep track of the wrapper and pass it to the adjust_key() method if they want to change the priority of the object later. 


2. HeapDict.  http://pypi.python.org/pypi/HeapDict

Looks, act, and quacks like a dict/MutableMapping, except popitem() returns the item with the lowest value instead of a random item.  Also, there's a peekitem() method to examine that item without removing it.  Since the interface is familiar, it's super-easy to use without having to look up the names of the methods whenever you need to use a heap. 

The downside is that you can't add the same element into the heap twice (just like you can't have the same key in a dictionary twice).  Personally I've only inserted the same element more than once when doing toy examples with integers, so for me at least that's no great loss. ;-)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC

George Sakkis

unread,
Apr 18, 2009, 10:57:09 AM4/18/09
to python-ideas
On Sat, Apr 18, 2009 at 10:01 AM, Stefan Behnel <stef...@behnel.de> wrote:

> Facundo Batista wrote:
>> Using the module heapq, I found that it's not easy to use. Or, at
>> least, that it could be straightforward to use.
>> [...]
>> So, I basically coded it. What do you think about including this class
>> in the heap module?
>
> I did something similar a couple of years ago (and I doubt that I'm the
> only one).

Indeed, you're not: http://code.activestate.com/recipes/440673/

George

Raymond Hettinger

unread,
Apr 18, 2009, 7:40:31 PM4/18/09
to Aahz, python...@python.org
Facundo, I would like to work with you on this.
I've been the primary maintainer for heapq for a while
and had already started working on something like this
in response to repeated requested to support a key= function
(like that for sorted/min/max).


Raymond


>> I think that changing "external list" for "internal attribute", and
>> "bunch of functions " for "methods", it will leave the module easier
>> to use, safer, and more object oriented.
>
> +1 -- I recently did a short presentation on Big O notation and Python
> containers, and heapq looked remarkably ugly.

_______________________________________________

Gerald Britton

unread,
Apr 20, 2009, 9:01:21 AM4/20/09
to Raymond Hettinger, python...@python.org
Raymond, Facundo,

May I suggest that you implement an interface similar to the sort()
method for list objects:
=========================
sort([cmp[, key[, reverse]]])

"cmp specifies a custom comparison function of two arguments (list
items) which should return a negative, zero or positive number
depending on whether the first argument is considered smaller than,
equal to, or larger than the second argument: cmp=lambda x,y:
cmp(x.lower(), y.lower()). The default value is None.

"key specifies a function of one argument that is used to extract a
comparison key from each list element: key=str.lower. The default
value is None.

"reverse is a boolean value. If set to True, then the list elements
are sorted as if each comparison were reversed."
========================

You could add these option arguments to the Heap class constructor,
which would then be used to maintain the heap. The _reverse_ argument
would cover the min/max heap question. The other arguments would
allow the user to use the class for a priority queue of more complex
objects. For example, the Heap could consist of object references
that would be compared using the function supplied in _cmp_ or _key_
arguments.

--
Gerald Britton

Terry Reedy

unread,
Apr 20, 2009, 1:41:54 PM4/20/09
to python...@python.org
Gerald Britton wrote:
> Raymond, Facundo,
>
> May I suggest that you implement an interface similar to the sort()
> method for list objects:
> =========================
> sort([cmp[, key[, reverse]]])
>
> "cmp specifies a custom comparison function of two arguments (list
> items) which should return a negative, zero or positive number
> depending on whether the first argument is considered smaller than,
> equal to, or larger than the second argument: cmp=lambda x,y:
> cmp(x.lower(), y.lower()). The default value is None.

Cmp(), .__cmp__, cmp=xxx, etc, are gone in Py3.

+something for the other two.

Gerald Britton

unread,
Apr 20, 2009, 3:16:58 PM4/20/09
to Terry Reedy, python...@python.org
Is this only targeting 3.x? I think it would be useful in 2.x as well.

--
Gerald Britton

Stefan Behnel

unread,
Apr 21, 2009, 1:13:02 AM4/21/09
to python...@python.org
Gerald Britton wrote:

>
> On Mon, Apr 20, 2009 at 1:41 PM, Terry Reedy wrote:
>> Gerald Britton wrote:
>>> May I suggest that you implement an interface similar to the sort()
>>> method for list objects:
>>> =========================
>>> sort([cmp[, key[, reverse]]])
>>>
>> Cmp(), .__cmp__, cmp=xxx, etc, are gone in Py3.
>>
> Is this only targeting 3.x? I think it would be useful in 2.x as well.

1) I think the usual approach is to implement it for Py3, then backport it.

2) Since we already know that code that uses cmp=xyz is not Py3 compatible,
there is little use in providing this parameter for a new data type, so
that new code ends up being broken in the future.

Stefan

Gerald Britton

unread,
Apr 21, 2009, 9:44:22 AM4/21/09
to Stefan Behnel, python...@python.org
Ok thanks. Any idea what the uptake is on Py3? I know that the
projects I work on are not even talking about moving in that direction
due to the significant migration effort. Also, I hear that
performance is an issue, though that might be improve over the longer
term.

--
Gerald Britton

Josiah Carlson

unread,
Apr 21, 2009, 12:08:53 PM4/21/09
to Daniel Stutzbach, python...@python.org

I've also got a pair heap implementation:
http://mail.python.org/pipermail/python-dev/2006-November/069845.html

There's also a proposed update of sched.py in the bugtracker (as part
of some updates to asyncore to add a scheduler) that offers a version
of decreasekey, as well as "remove" functionality. I describe some of
the trade-offs in implementing a heap with such features (memory use,
O-notation, etc.) in a blog post last summer:
http://chouyu-31.livejournal.com/316112.html . One of the ways we get
around the "key must be unique" limitation is that we don't use a
dictionary to reference the keys; when you put an item into the heap,
you get an opaque reference, which you can then cancel, "reschedule",
etc.

The sched module is heavily biased towards event scheduling, so adding
a collections.priority_queue or whatever is just fine with me.

- Josiah

Benjamin Peterson

unread,
Apr 23, 2009, 5:46:36 PM4/23/09
to python...@python.org
Gerald Britton <gerald.britton@...> writes:

>
> Ok thanks. Any idea what the uptake is on Py3? I know that the
> projects I work on are not even talking about moving in that direction
> due to the significant migration effort. Also, I hear that
> performance is an issue, though that might be improve over the longer
> term.

3.0, of course, doesn't have yet a huge following.

Most of the performance problems will be fixed in 3.1.

Facundo Batista

unread,
Dec 7, 2009, 5:36:34 AM12/7/09
to Raymond Hettinger, Aahz, python...@python.org
On Sat, Apr 18, 2009 at 8:40 PM, Raymond Hettinger <pyt...@rcn.com> wrote:

> Facundo, I would like to work with you on this.
> I've been the primary maintainer for heapq for a while
> and had already started working on something like this
> in response to repeated requested to support a key= function
> (like that for sorted/min/max).

After a not much complicated, but different year (I had a kid!), I'm
bringing this thread back to live.

There were different proposals of different people about what to do
after my initial mail, we can separate them in two sections:

- Move the Heap class to the collections module: I'm just changing the
heapq module to have an OO interface, instead a bunch of functions.
I'm +0 to moving it to "collections", but note that even after the
reordering of the stdlib for Py3, the heapq module remained there.

- Add functionality to the Heap class: I'm +0 to this, but I don't
want to stop this change in function of further functionality... I
propose to have the same functionality, in an OO and less error prone
way. We can add more functionality afterwards.

What do you think?

Raymond, let's work together... but don't know where. The Heap class
is already coded in my first mail, if you want to start from there and
add functionality, I'm +0. If you want me to add tests and push the
inclusion of that class into the module, just tell me. Something else,
I'm all ears, :)

Regards,

--
. Facundo

Reply all
Reply to author
Forward
0 new messages