Notes on heapq usage

1 view
Skip to first unread message

Matt Good

unread,
Mar 27, 2007, 4:41:51 PM3/27/07
to Trac Development
I just removed a few uses of the heapq module and wanted to post a few
comments about it. See:
http://trac.edgewall.org/changeset/5148

In the timeline heapq was used to sort the events. Based on these
benchmarks using heapq is much slower than the Python sort(),
especially in Python 2.3:
http://mail.python.org/pipermail/python-list/2005-January/304291.html

So "maxrows" would have to be very small compared to the full list
size for heapq to be faster than sort().

The other place was in the version control API where heapq was used to
get the "best" version control connector. For a single item using
max() will always be faster than heapq.

So, I just wanted to mention that you should check whether max() or
sort() will work before using the heapq module.

-- Matt Good

Eli Carter

unread,
Mar 27, 2007, 5:21:20 PM3/27/07
to trac...@googlegroups.com
On Tuesday 27 March 2007, Matt Good wrote:
>
> I just removed a few uses of the heapq module and wanted to post a few
> comments about it. See:
> http://trac.edgewall.org/changeset/5148
>
> In the timeline heapq was used to sort the events. Based on these
> benchmarks using heapq is much slower than the Python sort(),
> especially in Python 2.3:
> http://mail.python.org/pipermail/python-list/2005-January/304291.html
>
> So "maxrows" would have to be very small compared to the full list
> size for heapq to be faster than sort().

Cool. Question: Does this correctly handle the ordering of changesets that
are committed within the same second?

Eli

Matt Good

unread,
Mar 28, 2007, 4:32:25 PM3/28/07
to Trac Development

The timeline API doesn't define the behavior for ordering events with
the same timestamp. Unlike the previous heap-based sorting this now
uses the Python sort which is "stable". The ordering will depend on
the order the VC API yields events, and whether you're using Python
2.3. This is because the "stable" element of the sorted()
implementation in trac.util.compat is reversed from that in Python
2.4.

So, for example in Python 2.4:
>>> sorted(['a', u'a'])
['a', u'a']
>>> sorted(['a', u'a'], reverse=True)
['a', u'a']

With trac.util.compat.sorted:
>>> sorted(['a', u'a'], reverse=True)
[u'a', 'a']

As long as this caveat doesn't affect anything in Trac I'm ok with
leaving this as-is.

However, in regards to timeline sorting if we need to keep events with
the same timestamp in a predictable order the title may be a better
secondary field to sort on instead of the order they're produced by
the TimelineEventProvider.

-- Matt Good

Reply all
Reply to author
Forward
0 new messages