Heap data type, the revival

34 views
Skip to first unread message

Nick Timkovich

unread,
Oct 15, 2016, 1:41:25 PM10/15/16
to python-ideas
I once again had a use for heaps, and after rewrapping the heapq.heap* methods for the umpteenth time, figured I'd try my hand at freezing off that wart.

Some research turned up an older thread by Facundo Batista (https://mail.python.org/pipermail/python-ideas/2009-April/004173.html), but it seems like interest petered out. I shoved an initial pass at a spec, implementation, and tests (robbed from <cpython>/Lib/test/test_heapq.py mostly) into a repo at https://github.com/nicktimko/heapo My spec is basically:

1. Provide all existing heapq.heap* functions provided by the heapq module as methods with identical semantics
2. Provide limited magic methods to the underlying heap structure
  a. __len__ to see how big it is, also for boolean'ing
  b. __iter__ to allow reading out to something else (doesn't consume elements)
3. Add peek method to show, but not consume, lowest heap value
4. Allow custom comparison/key operation (to be implemented/copy-pasted)

Open Questions
* Should __init__ shallow-copy the list or leave that up to the caller? Less memory if the heap object just co-opts it, but user might accidentally reuse the reference and ruin the heap. If we make our own list then it's easier to just suck in any arbitrary iterable.
* How much should the underlying list be exposed? Is there a use case for __setitem__, __delitem__?
* Should there be a method to alter the priority of elements while preserving the heap invariant? Daniel Stutzbach mentioned dynamically increasing/decreasing priority of some list elements...but I'm inclined to let that be a later addition.
* Add some iterable method to consume the heap in an ordered fashion?

Cheers,
Nick

Nick Timkovich

unread,
Oct 15, 2016, 2:16:30 PM10/15/16
to python...@python.org
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Sven R. Kunze

unread,
Oct 15, 2016, 4:27:36 PM10/15/16
to python...@python.org
On 15.10.2016 20:15, Nick Timkovich wrote:
> I once again had a use for heaps, and after rewrapping the heapq.heap*
> methods for the umpteenth time, figured I'd try my hand at freezing
> off that wart.

We re-discussed this in the beginning of 2016 and xheap
https://pypi.python.org/pypi/xheap was one outcome. ;) In the course of
doing so, some performance improvements were also discovered and some
peculiarities of Python lists are discussed.

See here
https://mail.python.org/pipermail/python-list/2016-January/702568.html
and here
https://mail.python.org/pipermail/python-list/2016-March/704339.html

Cheers,
Sven

Nick Timkovich

unread,
Oct 15, 2016, 5:20:22 PM10/15/16
to Sven R. Kunze, python...@python.org
Features and speed are good, but I'm interested in getting something
with the basic features into the Standard Library so it's just there.
Not having done that before and bit clueless, I'm wanting to learn
that slightly less-technical procedure. What are the steps to make
that happen?

Sven R. Kunze

unread,
Oct 15, 2016, 5:22:50 PM10/15/16
to Nick Timkovich, python...@python.org
On 15.10.2016 23:19, Nick Timkovich wrote:
> Features and speed are good, but I'm interested in getting something
> with the basic features into the Standard Library so it's just there.
> Not having done that before and bit clueless, I'm wanting to learn
> that slightly less-technical procedure. What are the steps to make
> that happen?

As I said, it has been discussed and the consensus so far was: "not
everything needs to be a class if it does not provide substantial
benefit" + "functions are more flexible" + "if it's slower that the
original it won't happen".

Nick Timkovich

unread,
Oct 16, 2016, 1:04:05 PM10/16/16
to Sven R. Kunze, python...@python.org
Functions are great; I'm a big fan of functions. That said, the group
of heapq.heap* functions are literally OOP without using that "class"
word. As far as flexibility, what is the use of the those functions on
non-heap structures?

Devin Jeanpierre

unread,
Oct 16, 2016, 1:42:19 PM10/16/16
to Sven R. Kunze, Python-Ideas
> As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen".

(These) functions are less flexible here. heapq forbids the use of
anything except lists, for some reason. They would be *better* as list
methods, because then array.array could implement them, and code could
accept some arbitrary mutable sequence and transform it into a heap --
but instead, lists are required.

xheap has a similar problem -- for some reason it subclasses list,
which is bad practice in OOP, for good reason. Aside from making it
impossible to use with array.array, it also e.g. makes it too easy to
violate the heap invariant -- one of the big benefits of using a heap
interface could have been making that impossible whenever your
mutations originate from the heap object).


The main thing I've always wanted from heapq is the ability to specify
a key. This is a lot easier with a class:

x = heapq.Heap(..., key=len)
x.pop()

vs (hypothetical, because heapq doesn't have this feature):

x =...
heapq.heapify(x, key=len)
heapq.heappop(x, key=len)
# Don't ever forget key=len unless you want to waste a few hours debugging.

Classes would be more convenient and less dangerous as soon as you
start adding features like this. +1 to classes.


Replying to OP:

> * Should __init__ shallow-copy the list or leave that up to the
> caller? Less memory if the heap object just co-opts it, but user might
> accidentally reuse the reference and ruin the heap. If we make our own
> list then it's easier to just suck in any arbitrary iterable.

Leave it up to the caller. The caller can just as easily call
list(...) as you can, and might have good reasons to want to mutate
the existing thing. That said, as a safety thing, it might be
reasonable to create a new list by default but provide a special
factory function/class to build an instance that wraps the sequence
without copying. e.g.: heapq.Heap(x) vs heapq.HeapWrapper(x)).

> * How much should the underlying list be exposed? Is there a use case
> for __setitem__, __delitem__?

If you allow the caller to keep hold of the original list, then they
can always mutate it through that reference if they need to. If you
don't allow the caller to keep the original list, but you support the
list interface, then you've lost much of the safety you were trying to
keep by not reusing references.

-- Devin

Sven R. Kunze

unread,
Oct 17, 2016, 2:19:39 PM10/17/16
to Nick Timkovich, python...@python.org
On 16.10.2016 19:02, Nick Timkovich wrote:
> Functions are great; I'm a big fan of functions. That said, the group
> of heapq.heap* functions are literally OOP without using that "class"
> word. As far as flexibility, what is the use of the those functions on
> non-heap structures?

IIRC the statement wasn't about "non-heap structures". It was about, "I
need a heap which does something special and subclassing might not solve
it. So, I run my own implementation using those functions".

On the other hand, I can fully understand the need for general-purpose
oo-heap implementation. That why I put xheap on github/PyPI.


Best,
Reply all
Reply to author
Forward
0 new messages