[Python-Dev] On time complexity of heapq.heapify

1,349 views
Skip to first unread message

Rafael Almeida

unread,
Dec 12, 2016, 8:58:55 AM12/12/16
to pytho...@python.org
Hello,

Current heapify documentation says it takes linear time


However, investigating the code (Python 3.5.2) I saw this:

    def heapify(x):
        """Transform list into a heap, in-place, in O(len(x)) time."""
        n = len(x)
        # Transform bottom-up.  The largest index there's any point to looking at
        # is the largest with a child index in-range, so must have 2*i + 1 < n,
        # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
        # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
        # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
        for i in reversed(range(n//2)):
            _siftup(x, i)

From what I gather, _siftup(heap, pos) does not run in constant time, but rather it runs in time proportional to the height of the subtree with root in ``pos''. Although, according to the in-code comments, it should be faster than creating a heap by calling heappush multiple times, I believe the time complexity remains the same.

Although I had a hard time finding out the exact time complexity for that particular function, I think it is closer to O(log(n!)) than to O(n). I would be very happy to see an explanation as to why the time is O(n) (it does not seem possible to me to create a heap of n numbers in such runtime). However, if no one has a convincing argument, I'd rather omit time complexity information from documentation (given that this analysis is not made for the other functions either).

[]'s
Rafael

Raymond Hettinger

unread,
Dec 12, 2016, 11:14:00 AM12/12/16
to Rafael Almeida, Python-Dev@Python. Org

> On Dec 11, 2016, at 1:38 PM, Rafael Almeida <almei...@gmail.com> wrote:
>
> From what I gather, _siftup(heap, pos) does not run in constant time, but rather it runs in time proportional to the height of the subtree with root in ``pos''. Although, according to the in-code comments, it should be faster than creating a heap by calling heappush multiple times, I believe the time complexity remains the same.
>
> Although I had a hard time finding out the exact time complexity for that particular function, I think it is closer to O(log(n!)) than to O(n).

Hello Rafael,

The heapify() algorithm is well known and well studied. A quick Google search turns up plenty of explanations: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify

algorithm - How can building a heap be O(n) time complexity? - StackOverflow
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify

Data Structures Heap, Heap Sort & Priority Queue
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify
Sub tree rooted at i is a max heap. Simple bound:
– O(n) calls to MAX-HEAPIFY,
– Each of which takes O(lg n), – Complexity: O(n lg n).
– Thus, the running time of BUILD-MAX-HEAP is O(n).

Here's a more detailed explanation: http://www.cs.umd.edu/~meesh/351/mount/lectures/lect14-heapsort-analysis-part.pdf

If you have other follow-ups, please take this discussion to another list. This list is for the development of the Python core and not for general questions about algorithms or use of the language.

Raymond Hettinger
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Raymond Hettinger

unread,
Dec 12, 2016, 12:44:44 PM12/12/16
to Rafael Almeida, Python-Dev@Python. Org

> On Dec 12, 2016, at 8:12 AM, Raymond Hettinger <raymond....@gmail.com> wrote:
>
> The heapify() algorithm is well known and well studied. A quick Google search turns up plenty of explanations: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify
>
> algorithm - How can building a heap be O(n) time complexity? - StackOverflow
> https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify
>
> Data Structures Heap, Heap Sort & Priority Queue
> https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=complexity%20of%20heapify
> Sub tree rooted at i is a max heap. Simple bound:
> – O(n) calls to MAX-HEAPIFY,
> – Each of which takes O(lg n), – Complexity: O(n lg n).
> – Thus, the running time of BUILD-MAX-HEAP is O(n).
>
> Here's a more detailed explanation: http://www.cs.umd.edu/~meesh/351/mount/lectures/lect14-heapsort-analysis-part.pdf
>
> If you have other follow-ups, please take this discussion to another list. This list is for the development of the Python core and not for general questions about algorithms or use of the language.

I forgot to attach the measurements that demonstrate the O(n) complexity:

# Python 3 Code
from heapq import heapify
from random import randrange

cmp_cnt = 0

class Int(int):
def __lt__(self, other):
global cmp_cnt
cmp_cnt += 1
return super.__lt__(self, other)

def trial(n):
global cmp_cnt

data = [Int(randrange(1000000)) for i in range(n)]
cmp_cnt = 0
heapify(data)
return cmp_cnt

for n in [100, 1000, 10000, 100000, 1000000, 10000000]:
print("n = {:<10d} compares = {}".format(n, trial(n)))

Which outputs:

n = 100 compares = 155
n = 1000 compares = 1620
n = 10000 compares = 16446
n = 100000 compares = 164779
n = 1000000 compares = 1649165
n = 10000000 compares = 16493429

Reply all
Reply to author
Forward
0 new messages