Possible enumerate automatic conversion

23 views
Skip to first unread message

Daniele Nicolodi

unread,
Apr 16, 2013, 9:09:18 AM4/16/13
to cython...@googlegroups.com
Hello,

would it be desirable to have an enumerate() automatic conversion
similar to the range() conversion when the enumerated iterable is a C
array or a numpy array?

http://docs.cython.org/src/userguide/pyrex_differences.html#automatic-range-conversion

I imagine that detecting the opportunity for the optimization may be a
bit tricky, but it should not be much different than a combination of
what is done for iteration over a C array and for the range() function.

Thank you.

Cheers,
Daniele

Robert Bradshaw

unread,
Apr 16, 2013, 11:58:30 PM4/16/13
to cython...@googlegroups.com
That's certainly feasible, it's just a matter of someone implementing
the transformation.

- Robert

Stefan Behnel

unread,
Apr 17, 2013, 7:47:25 AM4/17/13
to cython...@googlegroups.com, Cython-devel
Daniele Nicolodi, 16.04.2013 15:09:
> would it be desirable to have an enumerate() automatic conversion
> similar to the range() conversion when the enumerated iterable is a C
> array or a numpy array?
>
> http://docs.cython.org/src/userguide/pyrex_differences.html#automatic-range-conversion

enumerate() is optimised since something like 0.13 or so, maybe longer. It
certainly works for char* and maybe also other C arrays.


> I imagine that detecting the opportunity for the optimization may be a
> bit tricky, but it should not be much different than a combination of
> what is done for iteration over a C array and for the range() function.

Yes, the tricky bit is to get the iterable through the type analysis
without coercion errors. This couldn't easily be done at the time but
became possible with my recent analyse_types() phase changes.

Essentially, for-loop optimisations can now be moved into the type analysis
phase and take the type of the iterable into account without running into
accidental coercion errors.

I'm not sure yet if this is the right way to do it - maybe we still want an
explicit transformation instead.

In any case, the right place to discuss this is the cython-devel mailing
list, not the users mailing list.

Stefan

Daniele Nicolodi

unread,
Apr 17, 2013, 8:04:47 AM4/17/13
to cython...@googlegroups.com
On 17/04/2013 13:47, Stefan Behnel wrote:
> Daniele Nicolodi, 16.04.2013 15:09:
>> would it be desirable to have an enumerate() automatic conversion
>> similar to the range() conversion when the enumerated iterable is a C
>> array or a numpy array?
>>
>> http://docs.cython.org/src/userguide/pyrex_differences.html#automatic-range-conversion
>
> enumerate() is optimised since something like 0.13 or so, maybe longer. It
> certainly works for char* and maybe also other C arrays.

If it is optimized, this is not written in the documentation. I'm using
enumerate() to iterate over a (small) numpy array and the optimization
is not there.

> In any case, the right place to discuss this is the cython-devel mailing
> list, not the users mailing list.

Agreed. I'll subscribe to that mailing list too.

Thanks. Cheers,
Daniele


Stefan Behnel

unread,
Apr 17, 2013, 9:17:14 AM4/17/13
to cython...@googlegroups.com
Daniele Nicolodi, 17.04.2013 14:04:
> On 17/04/2013 13:47, Stefan Behnel wrote:
>> Daniele Nicolodi, 16.04.2013 15:09:
>>> would it be desirable to have an enumerate() automatic conversion
>>> similar to the range() conversion when the enumerated iterable is a C
>>> array or a numpy array?
>>>
>>> http://docs.cython.org/src/userguide/pyrex_differences.html#automatic-range-conversion
>>
>> enumerate() is optimised since something like 0.13 or so, maybe longer. It
>> certainly works for char* and maybe also other C arrays.
>
> If it is optimized, this is not written in the documentation.

True. Optimisations are generally not documented because they'd just
unnecessarily bloat the documentation with useless information. Seriously,
what does it add for users to know that Cython has its own way of
implementing a given syntax construct or a given builtin?

range() is an exception here, because we were actively trying to get users
away from the for-from loop syntax that Pyrex introduced.

Here's the code for loop optimisations, BTW:

https://github.com/cython/cython/blob/master/Cython/Compiler/Optimize.py#L61


> I'm using
> enumerate() to iterate over a (small) numpy array and the optimization
> is not there.

It should be. Can you show us your code?

Stefan

Daniele Nicolodi

unread,
Apr 17, 2013, 9:32:38 AM4/17/13
to cython...@googlegroups.com
On 17/04/2013 15:17, Stefan Behnel wrote:
> True. Optimisations are generally not documented because they'd just
> unnecessarily bloat the documentation with useless information. Seriously,
> what does it add for users to know that Cython has its own way of
> implementing a given syntax construct or a given builtin?
>
> range() is an exception here, because we were actively trying to get users
> away from the for-from loop syntax that Pyrex introduced.

The fact is that looking at the generated C code I observed that the
enumerate() was not optimized as range() is. I went to the documentation
looking for an explanation and I found mention of range() only. This
drew me to think that enumerate() is not optimized.

I agree that listing all optimizations in the documentation is of little
use, but in the case where builtins are replaced with a different
implementation this makes much more sense because it changes radically
how the users write their code.

>> I'm using
>> enumerate() to iterate over a (small) numpy array and the optimization
>> is not there.
>
> It should be. Can you show us your code?

This is an example:

def test():
cdef unsigned int k
cdef double element
cdef np.ndarray[np.double_t, ndim=1] v1 \
= np.empty([10, ], np.double)
cdef np.ndarray[np.double_t, ndim=1] v2 \
= np.empty([10, ], np.double)

for i, element in enumerate(v1):
v2[i] = element * i

but it may be very well be that I'm doing something stupid.

Thank you. Cheers,
Daniele

Stefan Behnel

unread,
Apr 17, 2013, 9:46:38 AM4/17/13
to cython...@googlegroups.com
Daniele Nicolodi, 17.04.2013 15:32:
> On 17/04/2013 15:17, Stefan Behnel wrote:
>> True. Optimisations are generally not documented because they'd just
>> unnecessarily bloat the documentation with useless information. Seriously,
>> what does it add for users to know that Cython has its own way of
>> implementing a given syntax construct or a given builtin?
>>
>> range() is an exception here, because we were actively trying to get users
>> away from the for-from loop syntax that Pyrex introduced.
>
> The fact is that looking at the generated C code I observed that the
> enumerate() was not optimized as range() is. I went to the documentation
> looking for an explanation and I found mention of range() only. This
> drew me to think that enumerate() is not optimized.
>
> I agree that listing all optimizations in the documentation is of little
> use, but in the case where builtins are replaced with a different
> implementation this makes much more sense because it changes radically
> how the users write their code.

It shouldn't.


>>> I'm using
>>> enumerate() to iterate over a (small) numpy array and the optimization
>>> is not there.
>>
>> It should be. Can you show us your code?
>
> This is an example:
>
> def test():
> cdef unsigned int k
> cdef double element
> cdef np.ndarray[np.double_t, ndim=1] v1 \
> = np.empty([10, ], np.double)
> cdef np.ndarray[np.double_t, ndim=1] v2 \
> = np.empty([10, ], np.double)
>
> for i, element in enumerate(v1):
> v2[i] = element * i

Works for me (after adding the missing imports). I don't get a call to
enumerate() in the C code. Instead, it uses a Python variable as counter
and adds 1 to it in each step.

If you want a C variable as counter, type your variable accordingly.

Stefan

Daniele Nicolodi

unread,
Apr 17, 2013, 10:02:07 AM4/17/13
to cython...@googlegroups.com
On 17/04/2013 15:46, Stefan Behnel wrote:
> Daniele Nicolodi, 17.04.2013 15:32:
>> On 17/04/2013 15:17, Stefan Behnel wrote:
>>> True. Optimisations are generally not documented because they'd just
>>> unnecessarily bloat the documentation with useless information. Seriously,
>>> what does it add for users to know that Cython has its own way of
>>> implementing a given syntax construct or a given builtin?
>>>
>>> range() is an exception here, because we were actively trying to get users
>>> away from the for-from loop syntax that Pyrex introduced.
>>
>> The fact is that looking at the generated C code I observed that the
>> enumerate() was not optimized as range() is. I went to the documentation
>> looking for an explanation and I found mention of range() only. This
>> drew me to think that enumerate() is not optimized.
>>
>> I agree that listing all optimizations in the documentation is of little
>> use, but in the case where builtins are replaced with a different
>> implementation this makes much more sense because it changes radically
>> how the users write their code.
>
> It shouldn't.

If it should not, why did you document range() ?

>> def test():
>> cdef unsigned int k
>> cdef double element
>> cdef np.ndarray[np.double_t, ndim=1] v1 \
>> = np.empty([10, ], np.double)
>> cdef np.ndarray[np.double_t, ndim=1] v2 \
>> = np.empty([10, ], np.double)
>>
>> for i, element in enumerate(v1):
>> v2[i] = element * i
>
> Works for me (after adding the missing imports). I don't get a call to
> enumerate() in the C code. Instead, it uses a Python variable as counter
> and adds 1 to it in each step.

Indeed, there is no call to enumerate() but it does not make use of the
optimizations for accessing the numpy arrays. I would expect the
enumerate() loop to result in code very similar to the one produced by:

for i in range(10):
element = v1[i]
v2[i] = element * i

Daniele

Chris Barker - NOAA Federal

unread,
Apr 17, 2013, 11:54:22 AM4/17/13
to cython...@googlegroups.com
On Wed, Apr 17, 2013 at 7:02 AM, Daniele Nicolodi <dan...@grinta.net> wrote:
> Indeed, there is no call to enumerate() but it does not make use of the
> optimizations for accessing the numpy arrays. I would expect the
> enumerate() loop to result in code very similar to the one produced by:
>
> for i in range(10):
> element = v1[i]
> v2[i] = element * i

you need to type "i" -- probably in both cases.

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Daniele Nicolodi

unread,
Apr 17, 2013, 11:59:31 AM4/17/13
to cython...@googlegroups.com
On 17/04/2013 17:54, Chris Barker - NOAA Federal wrote:
> On Wed, Apr 17, 2013 at 7:02 AM, Daniele Nicolodi <dan...@grinta.net> wrote:
>> Indeed, there is no call to enumerate() but it does not make use of the
>> optimizations for accessing the numpy arrays. I would expect the
>> enumerate() loop to result in code very similar to the one produced by:
>>
>> for i in range(10):
>> element = v1[i]
>> v2[i] = element * i
>
> you need to type "i" -- probably in both cases.

Ops, sorry, there was a typo in the code I sent to the mailing list. "i"
is correctly typed in my test case. This is the full code:

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def test():
cdef unsigned int i
cdef double element
cdef np.ndarray[np.double_t, ndim=1] v1 \
= np.empty([10, ], np.double)
cdef np.ndarray[np.double_t, ndim=1] v2 \
= np.empty([10, ], np.double)

for i, element in enumerate(v1):
v2[i] = element * i

for i in range(10):
element = v1[i]
v2[i] = element * i

The two loops result it very different C code.

Cheers,
Daniele

Stefan Behnel

unread,
Apr 17, 2013, 12:18:01 PM4/17/13
to cython...@googlegroups.com
Daniele Nicolodi, 17.04.2013 16:02:
> On 17/04/2013 15:46, Stefan Behnel wrote:
>> Daniele Nicolodi, 17.04.2013 15:32:
>>> On 17/04/2013 15:17, Stefan Behnel wrote:
>>>> True. Optimisations are generally not documented because they'd just
>>>> unnecessarily bloat the documentation with useless information. Seriously,
>>>> what does it add for users to know that Cython has its own way of
>>>> implementing a given syntax construct or a given builtin?
>>>>
>>>> range() is an exception here, because we were actively trying to get users
>>>> away from the for-from loop syntax that Pyrex introduced.
>>>
>>> The fact is that looking at the generated C code I observed that the
>>> enumerate() was not optimized as range() is. I went to the documentation
>>> looking for an explanation and I found mention of range() only. This
>>> drew me to think that enumerate() is not optimized.
>>>
>>> I agree that listing all optimizations in the documentation is of little
>>> use, but in the case where builtins are replaced with a different
>>> implementation this makes much more sense because it changes radically
>>> how the users write their code.
>>
>> It shouldn't.
>
> If it should not, why did you document range() ?

Because people were using for-from instead of the normal Python way of
doing it, which is for-in-range. And they shouldn't. Same thing.

But it's hard to get bad habits out of a) the heads of users and b) the
Internet once they've escaped into the wild.

If you find a better way to express in the docs that for-in-range() is the
right way to do an integer for-loop in Python *and* Cython, we are always
happy about pull requests. Note that the page you are referring to is
specifically there to describe differences between Cython and Pyrex. In
Pyrex, for-from was a necessary evil and people learned to use it. In
Cython, it's deprecated because it's a syntactic wart that's both ugly and
redundant. That's a major difference that is worth documenting, don't you
think?

enumerate() is neither ugly nor redundant nor deprecated and is documented
in the official Python docs. So why should we document it for Cython?


>>> def test():
>>> cdef unsigned int k
>>> cdef double element
>>> cdef np.ndarray[np.double_t, ndim=1] v1 \
>>> = np.empty([10, ], np.double)
>>> cdef np.ndarray[np.double_t, ndim=1] v2 \
>>> = np.empty([10, ], np.double)
>>>
>>> for i, element in enumerate(v1):
>>> v2[i] = element * i
>>
>> Works for me (after adding the missing imports). I don't get a call to
>> enumerate() in the C code. Instead, it uses a Python variable as counter
>> and adds 1 to it in each step.
>
> Indeed, there is no call to enumerate() but it does not make use of the
> optimizations for accessing the numpy arrays. I would expect the
> enumerate() loop to result in code very similar to the one produced by:
>
> for i in range(10):
> element = v1[i]
> v2[i] = element * i

Ah, that makes it clearer what you meant. You weren't actually talking
about enumerate() at all. What you meant was that when you iterate over a
NumPy array, the loop doesn't use efficient indexing. So the actual problem
is array iteration, not enumerate().

Removing the indirection through enumerate() makes this clearer:

for element in v1:
pass

Now the question is: how should Cython know that ndarray objects implement
their iteration by indexing into their buffer?

Stefan

Reply all
Reply to author
Forward
0 new messages