Optimizing iteration on zip objects

50 views
Skip to first unread message

Antony Lee

unread,
Mar 24, 2015, 2:43:04 AM3/24/15
to cython...@googlegroups.com
Hi,
I believe that loops such as

for x, y, z in zip(xs, ys, zs): ...

(where the number of arguments and unpacking variables are directly given in that line) are currently not optimized at all (as in, a Python zip object is created, and tuples obtained and unpacked).  Having looked at the Cython sources, I see an optimization for enumerate (_transform_enumerate_iteration) that (I believe?) rewrites

for x in enumerate(y): ...

into

i = 0
for x' in y:
    i += 1
    ...

although it is not clear to me whether there is actual tuple creation and unpacking going on there.  I would be happy to try to write a similar optimizer for zip if there is interest for it, but would need some help (such as a short walkthrough of the "enumerate" optimizer).

Antony

Stefan Behnel

unread,
Mar 24, 2015, 3:09:07 AM3/24/15
to cython...@googlegroups.com
Patches welcome. It shouldn't be all that hard to get rid of the tuple
packing and unpacking, but then independently optimising the loops isn't
easy. Cython applies several optimisations to different looping patterns,
but they all assume that there is one thing that is being iterated over.
For example, iterating over a list is very fast as it uses direct item
access. If you iterate over two lists in parallel, however, the C code has
to be structured differently. And if you iterate over, say, a list and an
enumerated list, or a list and a dict, then the code will have to look
different again. In fact, dict iteration uses a C while-loop, whereas list
iteration uses a C for-loop.

The transform you found is the right place to look. Basically, it replaces
the existing (AST) code tree nodes that call enumerate() and loop over the
result by other nodes that implement the external increment. There are
three parts in iteration: the ForInStatNode for the actual loop, the
IteratorNode that holds the iterator, and the NextNode that gets the next
item. My guess is that you'd need new nodes for the iterator and next nodes
that handle multiple iterators, and then do a parallel assignment from the
NextNode to the list of targets, which Cython optimises into independent
assignments. But I haven't thought this through very well yet.

Does this help already? Please ask if anything is unclear.

Stefan

Reply all
Reply to author
Forward
0 new messages