Re: [cython-users] array slicing in Cython ?

Robert Bradshaw

unread,

Feb 12, 2013, 4:02:14 PM2/12/13

to cython...@googlegroups.com

On Tue, Feb 12, 2013 at 12:44 PM, JF Gallant <pyro...@gmail.com> wrote:
> Hi everybody !
>
> I'm new in the groups and with cython. I wrote a script I want to transfert
> to cython for speed. I did a lot of little test to see if it's possible for
> me or not.
> In my script in python I have a kdtree implementation. I find the way to
> sort a array of object (struct) by it's X , Y , Z coordinate with qsort
> included with cython.
> But now , what I need to do for finish my implementation , is doing a lot of
> slicing in my array but I doN,t know how.
>
> here is my short code example:
>
> cdef int len = 100
> cdef int *q
> cdef int *r
>
> q = <int *>malloc( len *cython.sizeof(int) )
>
> r = q[50:]
>
> and the error I got:
>
> r = q[50:]
> ^
> ------------------------------------------------------------
>
> hello.pyx:24:9: Slicing is not currently supported for 'int *'.

C pointers don't have an end, only a beginning. You can write r = q +
50 to let r be the same "list" as q starting at the 50th element, but
you'll still have to keep track of the ends manually.

> There a way to efficiently(speed) slice a array like python did without to
> have to use python object?
> I read a bit on memoryview but I don't be able to applicate it to my example
> ( raise error or crash at compilation time ).

Could you be more specific? This is probably what you need.

- Robert

Stefan Behnel

unread,

Feb 12, 2013, 4:07:26 PM2/12/13

to cython...@googlegroups.com

JF Gallant, 12.02.2013 21:44:

> Hi everybody !
>
> I'm new in the groups and with cython. I wrote a script I want to transfert
> to cython for speed. I did a lot of little test to see if it's possible for
> me or not.
> In my script in python I have a kdtree implementation. I find the way to
> sort a array of object (struct) by it's X , Y , Z coordinate with qsort
> included with cython.
> But now , what I need to do for finish my implementation , is doing a lot
> of slicing in my array but I doN,t know how.
>
> here is my short code example:
>
> cdef int len = 100
> cdef int *q
> cdef int *r
>
> q = <int *>malloc( len *cython.sizeof(int) )
>
> r = q[50:]
>
> and the error I got:
>
> r = q[50:]
> ^------------------------------------------------------------
>
> hello.pyx:24:9: Slicing is not currently supported for 'int *'.

If I understand correctly (your wording is a bit ambiguous), you are trying
to implement qsort with slicing. That's a bad idea. Slicing always means
that you must create "something" that holds the current slice bounds, so
you'd end up with creating some kind of Python object (or at least a
struct) for each iteration. That's costly. If you really want to use qsort,
it's way better to pass the current bound indices around than a sliced array.

That being said, qsort may or may not be what you want, there are better
sorting algorithms out there, especially Python's timsort. And you
certainly shouldn't be implementing any of them yourself.

In case I misunderstood what you are after, please provide some more details.

Stefan

Stefan Behnel

unread,

Feb 13, 2013, 8:31:02 AM2/13/13

to cython...@googlegroups.com

Hi,

please don't top-post.

JF Gallant, 13.02.2013 14:24:

> On Tuesday, February 12, 2013 4:07:26 PM UTC-5, Stefan Behnel wrote:
>> JF Gallant, 12.02.2013 21:44:

> Hi Stephan,
>
> Not really , I want to implement a KDtree. For that I need to sort part of
> a array.
> For that I use qsort already available with cython lib ( stdlib ). It's
> working very well.

Ok, understood.

It's actually the one in libc that you're using, though. We only provide
the declarations.

> But now I need to be able to slice my array before to sort it.
>
> Pseudo code Example:
> ###
> array = [8,5,6,3,9,1]
>
> slice_array = array[3:] # so [3,9,1]
>
> qsort(slice_array, 3 , sizeof(int) , compare_function) #use already
> existing qsort from cython
> ###

As I explained above, this is a bad idea. Instead, calculate the start
pointer and length of the subarray yourself and pass them into qsort().

Stefan

Stefan Behnel

unread,

Feb 13, 2013, 8:50:47 AM2/13/13

to Cython-users

JF Gallant, 13.02.2013 14:42:

> Soo , I'm too newbie in code to understand what you mean by " calculate the
> start pointer and length of the subarray yourself".
> Did you have a quick and simple example of that?

Sure. Instead of writing

slice_array = array[3:]
qsort(slice_array, 3, sizeof(int) , compare_function)

you write

qsort(array + 3, 3, sizeof(int) , compare_function)

Stefan

Sturla Molden

unread,

Feb 13, 2013, 8:55:20 AM2/13/13

to cython...@googlegroups.com

On 13.02.2013 14:24, JF Gallant wrote:
>
> Hi Stephan,
>
> Not really , I want to implement a KDtree. For that I need to sort part
> of a array.
> For that I use qsort already available with cython lib ( stdlib ). It's
> working very well.

> But now I need to be able to slice my array before to sort it.

Use the Cython code we have written, and don't waste your time
reinventing the wheel :)

https://github.com/scipy/scipy/blob/master/scipy/spatial/ckdtree.pyx

Sturla

Sturla Molden

unread,

Feb 13, 2013, 9:41:03 AM2/13/13

to cython...@googlegroups.com

On 13.02.2013 15:10, JF Gallant wrote:

> thanks a lot for your script but I cannot use numpy/scipy for some
> reasons ( one of those is i'm working on a addon for a 3D software call
> Blender with a embedded python version ) and I already have my own
> implementation of kdtree in python , just have to convert it in cython.
> But your help is very appreciated ! :)

Right.

I doesn't depend on SciPy though, and not very much on NumPy. All
algorithmic codes use plain C pointer artihmetic.

One advice though:

If you need qsort, I believe this means you use the median to split the
cells in the kd-tree. That is a common mistake. Although you will get a
balanced kd-tree, it is often inefficient because the cells may become
"long and thin". Thus, while the tree is balanced, it is still an
inefficient splitting of the (hyper)space.

That is why we used the sliding mid-point rule instead. It does not
guarantee a balanced kd-tree, but it does guarantee a kd-tree that gives
an efficient nearest-neighbor search.

You can see in the function cKDTree.__build how we compute and use the
sliding mid-point to build up the tree.

http://www.cs.umd.edu/~mount/Papers/cgc99-smpack.pdf

Sturla

Sturla Molden

unread,

Feb 13, 2013, 12:33:17 PM2/13/13

to cython...@googlegroups.com

On 13.02.2013 15:41, Sturla Molden wrote:

> http://www.cs.umd.edu/~mount/Papers/cgc99-smpack.pdf

If you take a look at their Figure 3, you can see what's happening.

If you use qsort and split on the median, you get a kd-tree similar to
the one on the left ('Standard split').

If you use the algorithm in cKDTree.__build, you get a kd-tree similar
to the one on the right ('Sliding-midpoint split').

You get much faster spatial queries from the kd-tree to the right. That
is why we didn't use sorting/medians to build up the kd-trees in SciPy.

Sturla

Jake Vanderplas

unread,

Feb 13, 2013, 1:52:41 PM2/13/13

to cython...@googlegroups.com

On Wed, Feb 13, 2013 at 9:33 AM, Sturla Molden <stu...@molden.no> wrote:

On 13.02.2013 15:41, Sturla Molden wrote:

http://www.cs.umd.edu/~mount/Papers/cgc99-smpack.pdf

If you take a look at their Figure 3, you can see what's happening.

If you use qsort and split on the median, you get a kd-tree similar to the one on the left ('Standard split').

This is getting a bit off the initial topic, but there's something I've always been confused about regarding the sliding midpoint rule in KD tree construction: the sliding midpoint rule seems to assume you're partitioning the entire space, rather than just the pieces of the space which contain points. In cKDTree, from what I understand, the effective bounding box of each node is shrunk to be the smallest possible box which contains all its data. This elliminates many of the long & thin partitions, without having to resort to using a sliding midpoint rule. For example, in Figure 3 of the paper you reference, most of the long and thin partitions will become more balanced once these regions are shrunk to the smallest box containing the set of points within it.

Using this strategy, which I believe is already present in cKDTree, my own benchmarks on query speed don't show any significant improvement associated with the added complexity of using sliding midpoint. I agree it might be useful for algorithms which require a complete partition of the parameter space, but KNN queries, in general, have no such requirement.

Jake

If you use the algorithm in cKDTree.__build, you get a kd-tree similar to the one on the right ('Sliding-midpoint split').

You get much faster spatial queries from the kd-tree to the right. That is why we didn't use sorting/medians to build up the kd-trees in SciPy.

Sturla

--

--- You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sturla Molden

unread,

Feb 13, 2013, 2:44:35 PM2/13/13

to cython...@googlegroups.com

On 13.02.2013 19:52, Jake Vanderplas wrote:
> In cKDTree, from what I understand, the effective
> bounding box of each node is shrunk to be the smallest possible box
> which contains all its data.

No, that would require a rotation of the data. Finding the smallest
possible box is an NP complete problem.

It just pics a box that contains all the data, and makes it as small as
possible. But it is not "the smallest bounding box"!

Because the data is N-dimensional, there can still be a lot of empty
space in the box. A single outlier in one dimension can generate a lot
of empty space.

Sturla

Jake Vanderplas

unread,

Feb 13, 2013, 4:27:59 PM2/13/13

to cython...@googlegroups.com

On Wed, Feb 13, 2013 at 11:44 AM, Sturla Molden <stu...@molden.no> wrote:

On 13.02.2013 19:52, Jake Vanderplas wrote:

In cKDTree, from what I understand, the effective
bounding box of each node is shrunk to be the smallest possible box
which contains all its data.

No, that would require a rotation of the data. Finding the smallest possible box is an NP complete problem.

Right - I meant "smallest axis-aligned box". That's easy to compute, which is why cKDTree uses it.

Assuming the figures in the paper are representative of the algorithm they use, my guess would be that the speedup associated with the sliding midpoint is due to the fact that they partition the entire space, rather than the portion of the space in which the data reside. This is the cause of the vast majority of their long & narrow regions.

It just pics a box that contains all the data, and makes it as small as possible. But it is not "the smallest bounding box"!

Because the data is N-dimensional, there can still be a lot of empty space in the box. A single outlier in one dimension can generate a lot of empty space.

This is true, but in my own benchmarks this problem seems to be insignificant. I'm certain it depends heavily on the distribution of data used, however, so there probably exist datasets in which the sliding midpoint leads to faster queries. I don't believe that is generally the case.

Jake

JF Gallant

unread,

Feb 13, 2013, 8:12:25 PM2/13/13

to cython...@googlegroups.com

Just another question about the main subject of my post ;). (kdtree discussion is really interesting but a bit to deep for me now)

If I have this array :

cdef int array[100]

And want to do a 50 to 100 slice:

cdef int slice[]
slice = array + 50

But to get 1 to 50 slice :

slice = array - 50 ?

and how to for 25 to 75 slice ?

I like to see I starting a good discussion about kdtree soo I let's you continue after this question :)

Jake Vanderplas

unread,

Feb 13, 2013, 10:02:09 PM2/13/13

to cython...@googlegroups.com

Hi,
When you use a definition like

cdef int x[100]

it is not a full-featured array like you are used to in numpy. For the purposes of the user, it carries no information about its length. So you need to keep track of the length yourself.

If you want the equivalent of x[50:], then what you're asking for is a new array of length 50, which starts at the 50th element of x. So you can write

cdef int* xslice = x + 50

The `*` after the int says it's a pointer to an integer, and the `+ 50` says you want it to point to the 50th element after the initial position in x. Again, xslice contains no information about its length: you have to keep track of that yourself.

To get an array starting at the 25th element, you could do

xslice = x + 25

Again, the length of the slice is not contained in `xslice`, it's simply a pointer to a location in memory. If you'd like it to be length 50, you can treat it that way. If you'd like it to be length 10 or length 5, that's fine as well. There's no difference in how you define xslice: it is just a pointer to a memory location where the array begins.

Similarly, if you'd like the equivalent of x[:50], you don't need to change x at all: it's already pointing to the first element of the array you want. Because x stores no length information, the length you use is up to you.

I hope that's clear,
Jake

--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.

Sturla Molden

unread,

Feb 14, 2013, 7:53:20 AM2/14/13

to cython...@googlegroups.com

On 14.02.2013 02:12, JF Gallant wrote:
> Just another question about the main subject of my post ;). (kdtree discussion is really interesting but a bit to deep for me now)
>
> If I have this array :
>
> cdef int array[100]
>
> And want to do a 50 to 100 slice:
>
> cdef int slice[]
> slice = array + 50
>
> But to get 1 to 50 slice :

I think you are confusing "slicing" with "pointer arithmetics".

You can slice with typed memoryviews:

cdef int array[100]
cdef int[::1] slice1 = array[:50] # 0 to 49
cdef int[::1] slice2 = array[50:100] # 50 to 99
cdef int[::1] slice3 = array[50:75] # 50 to 74
cdef int[::1] slice4 = array[25:75] # 25 to 74

http://docs.cython.org/src/userguide/memoryviews.html

You can do pointer arithmetics too:

cdef int array[100]
cdef int *p1 = array
cdef int *p2 = array + 50
cdef int *p3 = array + 50
cdef int *p4 = array + 25

This is similar to C, but not actually "slicing" in a Pythonic sence.

I recommend you use typed memoryviews. You will get a small speed
penalty (e.g. 10%), but it leads to much cleaner numerical code. We did
not use typed memoryviews in cKDTree because of compatibility with
Python 2.4.

Sturla

JF Gallant

unread,

Feb 14, 2013, 8:37:19 AM2/14/13

to cython...@googlegroups.com

thanks a lot for the explaination. Sorry for noobs question. I think I need to have a good talk with my brother soon to explain me basic thing ( he is a programmer at Ubisoft ).
yes memviews seam to be better and cleaner but I get a crash at compiling everytime I try to use it ( just by writing the line cdef int[::1] slice1 for example ). It's because I use python 3.3 ?

Sturla Molden

unread,

Feb 14, 2013, 9:14:31 AM2/14/13

to cython...@googlegroups.com

On 14.02.2013 14:37, JF Gallant wrote:

> thanks a lot for the explaination. Sorry for noobs question. I think I
> need to have a good talk with my brother soon to explain me basic thing
> ( he is a programmer at Ubisoft ).
> yes memviews seam to be better and cleaner but I get a crash at
> compiling everytime I try to use it ( just by writing the line cdef
> int[::1] slice1 for example ). It's because I use python 3.3 ?

I made a tiny mistake. We can create a view of a C pointer, but not
slice it. So we have to create a view first, and then slice afterwards.

This works though (Cython 0.18):

cimport cython
import numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
def foobar():

# allocate a C array on the stack
cdef int array[100]

# make a memoryview of the C array
cdef int[::1] varray = array

# slice it (0 to 49)
cdef int[::1] slice = varray[:50]

# write some data into the slice and
cdef int i
for i in range(slice.shape[0]):
slice[i] = i

# return a NumPy array with a copy of the slice
# any other PEP3118 buffer would work too

return np.array(slice)

And after compiling:

Python 2.7.3 |EPD 7.3-2 (64-bit)| (default, Apr 12 2012, 15:20:16) [MSC
v.1500 64 bit (AMD64)] on win32
Type "credits", "demo" or "enthought" for more information.
>>> import test
>>> test.foobar()
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
>>>

Sturla

Sturla Molden

unread,

Feb 14, 2013, 9:45:03 AM2/14/13

to cython...@googlegroups.com

On 14.02.2013 15:14, Sturla Molden wrote:

> I made a tiny mistake. We can create a view of a C pointer, but not
> slice it. So we have to create a view first, and then slice afterwards.
>
> This works though (Cython 0.18):

Maybe this one is a bit more informative:

cimport cython
import numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)

def foobar1():

# declare a C array
cdef int array[100]

# Make a memoryview of the array.
# Cython knows its shape from the declaration.

cdef int[::1] varray = array

# Print the shape to verify that Cython
# actually knew the shape.
print tuple([int(i) for i in <Py_ssize_t[:8]> varray.shape])

# slice the view
cdef int[::1] slice0 = varray[:50]

# Write some data into the slice.
cdef int i
for i in range(slice0.shape[0]):
slice0[i] = i

# Return a NumPy array to Python
return np.array(slice0)

@cython.boundscheck(False)
@cython.wraparound(False)
def foobar2():

# declare a C array
cdef int array[100]

# Set a pointer to the array.
cdef int *p = &array[0]

# Make a memoryview of the pointer's target.
# We must inform Cython of its shape.
cdef int[::1] slice1 = <int[:100:1]> p

# slice it
cdef int[::1] slice2 = slice1[:50]

# Write some data into the slice.
cdef int i
for i in range(slice2.shape[0]):
slice2[i] = i

# Return a NumPy array to Python.
return np.array(slice2)

Regards,
Sturla

JF Gallant

unread,

Feb 14, 2013, 9:48:33 AM2/14/13

to cython...@googlegroups.com

it's not working for me.
If I take your code without the Numpy stuff:

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def foobar():

     # allocate a C array on the stack
     cdef int array[100]

     # make a memoryview of the C array
     cdef int[::1] varray = array

     # slice it (0 to 49)
     cdef int[::1] slice = varray[:50]

     # write some data into the slice and
     cdef int i
     for i in range(slice.shape[0]):
         slice[i] = i

return

I got this error:

D:\sandbox\cython_test\test_v03b>D:\Python33\python.exe setup.py build_ext --inplace
running build_ext
skipping 'multihello.c' Cython extension (up-to-date)
running build_ext
cythoning hello.pyx to hello.c
building 'hello' extension
D:\MinGW\bin\gcc.exe -mdll -O -Wall -ID:\Python33\include -ID:\Python33\include -c hello.c -o build\temp.win32-3.3\Release\hello.o
writing build\temp.win32-3.3\Release\hello.def
D:\MinGW\bin\gcc.exe -shared -s build\temp.win32-3.3\Release\hello.o build\temp.win32-3.3\Release\hello.def -LD:\Python33\libs -LD:\Python33\PCbuild -lpython33
-lmsvcr100 -o D:\sandbox\cython_test\test_v03b\hello.pyd
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0x1560): undefined reference to `__sync_fetch_and_add_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0x5a0e): undefined reference to `__sync_fetch_and_add_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xbd63): undefined reference to `__sync_fetch_and_sub_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xbe22): undefined reference to `__sync_fetch_and_sub_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xcd7d): undefined reference to `__sync_fetch_and_add_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xcf64): undefined reference to `__sync_fetch_and_sub_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xd010): undefined reference to `__sync_fetch_and_sub_4'
build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xd069): undefined reference to `__sync_fetch_and_sub_4'
collect2: ld a retourné 1 code d'état d'exécution
error: command 'gcc' failed with exit status 1

Everytime a wrote memview with "[::1]" a got this kind of error. I use Cython 0.18 with Python 3.3. I compile with MinGW GCC 4.6.2 on Windows 7 pro

Sturla Molden

unread,

Feb 14, 2013, 9:57:54 AM2/14/13

to cython...@googlegroups.com

On 14.02.2013 15:48, JF Gallant wrote:

> build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xd069): undefined
> reference to `__sync_fetch_and_sub_4'
> collect2: ld a retourné 1 code d'état d'exécution
> error: command 'gcc' failed with exit status 1
>
> Everytime a wrote memview with "[::1]" a got this kind of error. I use
> Cython 0.18 with Python 3.3. I compile with MinGW GCC 4.6.2 on Windows 7 pro

That looks like a bug in Cython to me. I have Python 2.7 so I cannot verify.

Sturla

Sturla Molden

unread,

Feb 14, 2013, 10:02:31 AM2/14/13

to cython...@googlegroups.com

On 14.02.2013 15:45, Sturla Molden wrote:

> @cython.boundscheck(False)
> @cython.wraparound(False)
> def foobar2():
>
> # declare a C array
> cdef int array[100]
>
> # Set a pointer to the array.
> cdef int *p = &array[0]
>
> # Make a memoryview of the pointer's target.
> # We must inform Cython of its shape.
> cdef int[::1] slice1 = <int[:100:1]> p
>
> # slice it
> cdef int[::1] slice2 = slice1[:50]
>
> # Write some data into the slice.
> cdef int i
> for i in range(slice2.shape[0]):
> slice2[i] = i
>
> # Return a NumPy array to Python.
> return np.array(slice2)

By the way...

To avoid NumPy you can e.g. do this instead:

return [int(i) for i in slice2]

Sturla

JF Gallant

unread,

Feb 14, 2013, 10:39:27 AM2/14/13

to cython...@googlegroups.com

So , I can enter it in the bug tracker at this address ?
http://trac.cython.org/cython_trac/

Robert Bradshaw

unread,

Feb 14, 2013, 7:27:41 PM2/14/13

to cython...@googlegroups.com

On Thu, Feb 14, 2013 at 7:39 AM, JF Gallant <pyro...@gmail.com> wrote:
>
> On Thursday, February 14, 2013 9:57:54 AM UTC-5, Sturla Molden wrote:
>>
>> On 14.02.2013 15:48, JF Gallant wrote:
>>
>> > build\temp.win32-3.3\Release\hello.o:hello.c:(.text+0xd069): undefined
>> > reference to `__sync_fetch_and_sub_4'
>> > collect2: ld a retourné 1 code d'état d'exécution
>> > error: command 'gcc' failed with exit status 1
>> >
>> > Everytime a wrote memview with "[::1]" a got this kind of error. I use
>> > Cython 0.18 with Python 3.3. I compile with MinGW GCC 4.6.2 on Windows 7
>> > pro
>>
>> That looks like a bug in Cython to me. I have Python 2.7 so I cannot
>> verify.

Yes, please do. Note that if it's a windows-only bug, most of us will
have a harder time reproducing it (I, for one, don't own a Windows
box).

- Robert

JF Gallant

unread,

Feb 14, 2013, 7:54:38 PM2/14/13

to cython...@googlegroups.com

if I understand , I need to ask you a account for the bugtracker to submit a bug ticket?

Robert Bradshaw

unread,

Feb 14, 2013, 8:26:52 PM2/14/13

to cython...@googlegroups.com

Yes. Send me an htpasswd entry offlist.

On Thu, Feb 14, 2013 at 4:54 PM, JF Gallant <pyro...@gmail.com> wrote:
> if I understand , I need to ask you a account for the bugtracker to submit a bug ticket?
>

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.

JF Gallant

unread,

Feb 15, 2013, 3:08:53 PM2/15/13

to cython...@googlegroups.com

did you receive it ? I send it to you by email.

Stefan Behnel

unread,

Feb 15, 2013, 3:26:42 PM2/15/13

to cython...@googlegroups.com

Sturla Molden, 14.02.2013 15:57:

Could it be that there's something in the NumPy headers that just
accidentally makes this work? I mean, it's not very common to use these
things completely without having NumPy in the loop in one way or another,
so it might just have slipped through.

Although, we have at least the test with Python's array.array type in the
test suite, so if that doesn't show it...

JF, could you verify that Cython's test suite works correctly on your
system? Here's how to run it:

http://wiki.cython.org/HackerGuide#Thetestsuite

Stefan

mark florisson

unread,

Feb 15, 2013, 7:04:36 PM2/15/13

to cython...@googlegroups.com

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to cython-users...@googlegroups.com.

> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Cython uses the __sync_fetch_and_add() GCC intrinsic to do atomic
reference counting of memoryview slices. It looks to me like this does
not work on Windows, so Cython should guard for that.

mark florisson

unread,

Feb 15, 2013, 7:22:22 PM2/15/13

to cython...@googlegroups.com

I pushed a fix to cython master, if you want you can try it out.

JF Gallant

unread,

Feb 15, 2013, 8:28:49 PM2/15/13

to cython...@googlegroups.com

thanks ! I see your commit on github!
How I install it , just replace the cython folder?
I always use the binary installer for windows to install cython

JF Gallant

unread,

Feb 15, 2013, 10:03:56 PM2/15/13

to cython...@googlegroups.com, stef...@behnel.de

Did you want to see the output of the run_test.py ?

A couples of error happen.

Stefan Behnel

unread,

Feb 16, 2013, 10:04:06 AM2/16/13

to cython...@googlegroups.com

JF Gallant, 16.02.2013 02:28:

> thanks ! I see your commit on github!
> How I install it , just replace the cython folder?
> I always use the binary installer for windows to install cython

You can just set your PYTHONPATH environment variable to include the base
directory of the archive, i.e. the directory where the "Cython" package lives.

Could you re-run the test suite with that (in verbose mode) and, if you
still get errors, put the complete output somewhere on the web, e.g. in
gist or pastebin? (and post the URL here)

Stefan

Stefan Behnel

unread,

Feb 16, 2013, 10:11:53 AM2/16/13

to cython...@googlegroups.com

mark florisson, 16.02.2013 01:22:

>> Cython uses the __sync_fetch_and_add() GCC intrinsic to do atomic
>> reference counting of memoryview slices. It looks to me like this does
>> not work on Windows, so Cython should guard for that.
>
> I pushed a fix to cython master, if you want you can try it out.

Hmm, right, the only Windows user who reported back on our last release was
Christoph Gohlke, and he's using MSVC, not MinGW. Apparently, the GCC
intrinsic isn't portable.

http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Atomic-Builtins.html

"""
Not all operations are supported by all target processors. If a particular
operation cannot be implemented on the target processor, a warning will be
generated and a call an external function will be generated. The external
function will carry the same name as the builtin, with an additional suffix
`_n' where n is the size of the data type.
"""

Stefan

mark florisson

unread,

Feb 16, 2013, 10:43:38 AM2/16/13

to cython...@googlegroups.com

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Oh, that's neat, we can just supply those functions then and use the
locking approach.

Sturla Molden

unread,

Feb 17, 2013, 10:24:16 PM2/17/13

to cython...@googlegroups.com

Den 16. feb. 2013 kl. 01:04 skrev mark florisson <markflo...@gmail.com>:

>
> Cython uses the __sync_fetch_and_add() GCC intrinsic to do atomic
> reference counting of memoryview slices. It looks to me like this does
> not work on Windows, so Cython should guard for that.
>

It does work on Windows. I get no errors from that (Win64, Python 2.7, GCC 4.7).

Sturla

Sturla Molden

unread,

Feb 17, 2013, 10:30:18 PM2/17/13

to cython...@googlegroups.com, cython...@googlegroups.com

Sendt fra min iPad

Den 16. feb. 2013 kl. 16:11 skrev Stefan Behnel <stef...@behnel.de>:

> mark florisson, 16.02.2013 01:22:
>>> Cython uses the __sync_fetch_and_add() GCC intrinsic to do atomic
>>> reference counting of memoryview slices. It looks to me like this does
>>> not work on Windows, so Cython should guard for that.
>>
>> I pushed a fix to cython master, if you want you can try it out.
>
> Hmm, right, the only Windows user who reported back on our last release was
> Christoph Gohlke, and he's using MSVC, not MinGW. Apparently, the GCC
> intrinsic isn't portable.
>
> http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Atomic-Builtins.html

Cython should use the Interlocked* API on MSVC and GCC intrinsics on MinGW.

Sturla

Sturla Molden

unread,

Feb 17, 2013, 10:33:38 PM2/17/13

to cython...@googlegroups.com, cython...@googlegroups.com

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683504(v=vs.85).aspx

By the way: GCC intrinsics are never "portable", never assume that.

Sturla

Stefan Behnel

unread,

Feb 18, 2013, 1:59:07 AM2/18/13

to cython...@googlegroups.com

Sturla Molden, 18.02.2013 04:33:
> Den 18. feb. 2013 kl. 04:30 skrev Sturla Molden:

>> Den 16. feb. 2013 kl. 16:11 skrev Stefan Behnel:
>>> mark florisson, 16.02.2013 01:22:
>>>>> Cython uses the __sync_fetch_and_add() GCC intrinsic to do atomic
>>>>> reference counting of memoryview slices. It looks to me like this does
>>>>> not work on Windows, so Cython should guard for that.
>>>>
>>>> I pushed a fix to cython master, if you want you can try it out.
>>>
>>> Hmm, right, the only Windows user who reported back on our last release was
>>> Christoph Gohlke, and he's using MSVC, not MinGW. Apparently, the GCC
>>> intrinsic isn't portable.
>>>
>>> http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Atomic-Builtins.html
>>
>> Cython should use the Interlocked* API on MSVC and GCC intrinsics on MinGW.
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683504(v=vs.85).aspx

It already did that before.

> By the way: GCC intrinsics are never "portable", never assume that.

I meant "portable" as in works in GCC most of the time, but apparently not
in MinGW on Windows. If you say that that works, too, I wonder what the
original problem was, because the symptom was clearly as described in the
GCC docs I quoted above.

Stefan

Sturla Molden

unread,

Feb 18, 2013, 4:05:21 AM2/18/13

to cython...@googlegroups.com, cython...@googlegroups.com

Den 18. feb. 2013 kl. 07:59 skrev Stefan Behnel <stef...@behnel.de>:

>>>>
>>>> http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Atomic-Builtins.html
>>>
>>> Cython should use the Interlocked* API on MSVC and GCC intrinsics on MinGW.
>>
>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms683504(v=vs.85).aspx
>
> It already did that before.
>
>
>> By the way: GCC intrinsics are never "portable", never assume that.
>
> I meant "portable" as in works in GCC most of the time, but apparently not
> in MinGW on Windows. If you say that that works, too, I wonder what the
> original problem was, because the symptom was clearly as described in the
> GCC docs I quoted above.
>
>

But it does work on MinGW on Windows. I've used these intrinsics on MinGW a lot.

Sturla

Sturla Molden

unread,

Feb 18, 2013, 4:10:19 AM2/18/13

to cython...@googlegroups.com

And please don't wreck the performance on MinGW by using Interlocked API which isn't intrinsics on the MinGW compiler. GNU extensions and intrinsics work on MinGW.

Sturla

Sturla Molden

unread,

Feb 18, 2013, 4:28:36 AM2/18/13

to cython...@googlegroups.com, cython...@googlegroups.com

Den 18. feb. 2013 kl. 07:59 skrev Stefan Behnel <stef...@behnel.de>:

By the way: GCC intrinsics are never "portable", never assume that.

I meant "portable" as in works in GCC most of the time, but apparently not
in MinGW on Windows. If you say that that works, too, I wonder what the
original problem was, because the symptom was clearly as described in the
GCC docs I quoted above.

"Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix `_n' where n is the size of the data type."

I read target PROCESSORs, not target OS...

I suppose it has do with what opcode GCC can generate. But on any x86 or amd64 processor (the only target platforms supported by MinGW), GCC can surely generate opcode for atomic memory access. That is no different form e.g. compiling for the same processors on Linux.

Sturla

mark florisson

unread,

Feb 18, 2013, 5:16:33 AM2/18/13

to cython...@googlegroups.com

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

That's what I was afraid of :) We use different intrinsics on MSVC,
Intel ICC and GCC. We align the pointer at runtime to sizeof(long)
(MSVC) or sizeof(int) (others), and relative to that we need to find
the lock. Maybe we need to specify the object layout in C and specify
field alignment, so we can find the lock from the counter pointer in
__sync_fetch_and_sub_n.

https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView_C.c#L23
https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx#L341

Sturla Molden

unread,

Feb 18, 2013, 6:37:05 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 11:16, mark florisson wrote:

> That's what I was afraid of :) We use different intrinsics on MSVC,
> Intel ICC and GCC. We align the pointer at runtime to sizeof(long)
> (MSVC) or sizeof(int) (others), and relative to that we need to find
> the lock. Maybe we need to specify the object layout in C and specify
> field alignment, so we can find the lock from the counter pointer in
> __sync_fetch_and_sub_n.
>
> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView_C.c#L23
> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx#L341

As far as I can see, this codes ends up not using atomics on MinGW,
Cygwin and DJGPP. The appropriate response would be to ask Gallant to
use a non-broken MinGW. It is not MinGW that is the culprit, he must be
using a badly configured GCC.

I think we should rather implement __sync_fetch_and_sub_n as an inline
function, so it gets called whenever GCC emits this symbol (which might
just as well happen on Linux if GCC is broken!) On Windows this function
should use Interlocked*, and on OSX it should use OSAtomic*. We don't
need to resort to a lock.

Sturla

mark florisson

unread,

Feb 18, 2013, 6:58:19 AM2/18/13

to cython...@googlegroups.com

On 18 February 2013 11:37, Sturla Molden <stu...@molden.no> wrote:
> On 18.02.2013 11:16, mark florisson wrote:
>
>> That's what I was afraid of :) We use different intrinsics on MSVC,
>> Intel ICC and GCC. We align the pointer at runtime to sizeof(long)
>> (MSVC) or sizeof(int) (others), and relative to that we need to find
>> the lock. Maybe we need to specify the object layout in C and specify
>> field alignment, so we can find the lock from the counter pointer in
>> __sync_fetch_and_sub_n.
>>
>>
>> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView_C.c#L23
>>
>> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx#L341
>
>
> As far as I can see, this codes ends up not using atomics on MinGW, Cygwin
> and DJGPP. The appropriate response would be to ask Gallant to use a
> non-broken MinGW. It is not MinGW that is the culprit, he must be using a
> badly configured GCC.

Yes, that's due to the "fix" I pushed.

> I think we should rather implement __sync_fetch_and_sub_n as an inline
> function, so it gets called whenever GCC emits this symbol (which might just
> as well happen on Linux if GCC is broken!) On Windows this function should
> use Interlocked*, and on OSX it should use OSAtomic*. We don't need to
> resort to a lock.
>

I agree, but is there anything for Linux and other platforms than the
ones above we can use?

Sturla Molden

unread,

Feb 18, 2013, 7:48:43 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 12:58, mark florisson wrote:

>> As far as I can see, this codes ends up not using atomics on MinGW, Cygwin
>> and DJGPP. The appropriate response would be to ask Gallant to use a
>> non-broken MinGW. It is not MinGW that is the culprit, he must be using a
>> badly configured GCC.
>
> Yes, that's due to the "fix" I pushed.

Unpush please? :-)

The current code resorts to using a kernel semaphore on MinGW
(see Python's condvar.h), which is terribly slow. A critical section
would be better.

http://hg.python.org/cpython/file/aa77f7eb2bf1/Python/condvar.h
http://hg.python.org/cpython/file/aa77f7eb2bf1/Python/thread_nt.h

>> I think we should rather implement __sync_fetch_and_sub_n as an inline
>> function, so it gets called whenever GCC emits this symbol (which might just
>> as well happen on Linux if GCC is broken!) On Windows this function should
>> use Interlocked*, and on OSX it should use OSAtomic*. We don't need to
>> resort to a lock.
>>
>
> I agree, but is there anything for Linux and other platforms than the
> ones above we can use?

Python only supports three thread APIs: Win32 threads, pthreads and pth
threads. Thus: no reason to support any other in Cython as well.

http://hg.python.org/cpython/file/aa77f7eb2bf1/Python/thread_nt.h
http://hg.python.org/cpython/file/aa77f7eb2bf1/Python/thread_pthread.h
http://hg.python.org/cpython/file/aa77f7eb2bf1/Python/thread_pth.h

Instead of PyThread_acquire_lock/PyThread_release_lock:

Win32:

Either:
EnterCriticalSection
LeaveCriticalSection

(Don't use SRW locks which requires WIndows 7)

or preferably:

InterlockedIncrement
InterlockedDecrement

They are in Windows.h, also on MinGW (but not compiled to intrinsics).

pthreads:

pthread_mutex_lock
pthread_mutex_unlock

pth threads:

pth_mutex_acquire
pth_mutex_release

Sturla

Sturla Molden

unread,

Feb 18, 2013, 8:21:26 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 12:58, mark florisson wrote:

>>> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView_C.c#L23
>>>
>>> https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx#L341
>>
>>
>> As far as I can see, this codes ends up not using atomics on MinGW, Cygwin
>> and DJGPP. The appropriate response would be to ask Gallant to use a
>> non-broken MinGW. It is not MinGW that is the culprit, he must be using a
>> badly configured GCC.
>
> Yes, that's due to the "fix" I pushed.

Adding this to the GCC defines would do much less harm:

#if CYTHON_ATOMICS && __GNUC__ >= 4 && (__GNUC_MINOR__ > 1 || \
GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL >= 2))

/* gcc >= 4.1.2 */
#define __pyx_atomic_incr_aligned(value, lock)
__sync_fetch_and_add(value, 1)
#define __pyx_atomic_decr_aligned(value, lock)
__sync_fetch_and_sub(value, 1)

#ifdef WIN32
#include <Windows.h>
#define __sync_fetch_and_add_4 InterlockedIncrement(value)
#define __sync_fetch_and_sub_4 InterlockedDecrement(value)
#endif

#ifdef __PYX_DEBUG_ATOMICS
#warning "Using GNU atomics"
#endif

Sturla

Sturla Molden

unread,

Feb 18, 2013, 8:29:00 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 14:21, Sturla Molden wrote:

OOps, that should be

> #ifdef WIN32
> #include <Windows.h>
> #define __sync_fetch_and_add_4(x) InterlockedIncrement((LONG volatile*)x)
> #define __sync_fetch_and_sub_4(x) InterlockedDecrement((LONG volatile*)x)
> #endif

InterlockedIncrement and InterlockedDecrement will always work on 32-bit
values, never 64-bit. So this will be safe with a 32-bit int.

S.M.

Sturla Molden

unread,

Feb 18, 2013, 8:34:57 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 14:29, Sturla Molden wrote:
> #ifdef WIN32
> #include <Windows.h>
> #define __sync_fetch_and_add_4(x) InterlockedIncrement((LONG
> volatile*)x)
> #define __sync_fetch_and_sub_4(x) InterlockedDecrement((LONG
> volatile*)x)
> #endif

Or to be completely safe:

#ifdef WIN32
#include <Windows.h>
#define __sync_fetch_and_add_4(x) \
InterlockedIncrement((LONG volatile*)x)

#define __sync_fetch_and_sub_4(x) \
InterlockedDecrement((LONG volatile*)x)

#define __sync_fetch_and_add_8(x) \
InterlockedIncrement64((LONGLONG volatile*)x)

#define __sync_fetch_and_sub_8(x) \
InterlockedDecrement64((LONGLONG volatile*)x)
#endif

This will direct missing GNU intrinsics in bad MinGWs to Windows API
atomics, or do nothing if MinGW can use GNU intrinsics.

Sturla

Sturla Molden

unread,

Feb 18, 2013, 9:09:54 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 14:34, Sturla Molden wrote:

> This will direct missing GNU intrinsics in bad MinGWs to Windows API
> atomics, or do nothing if MinGW can use GNU intrinsics.

Actually I think we should use InterlockedAdd to make the macros not
mess with v:

#ifdef WIN32
#include <Windows.h>
#define __sync_fetch_and_add_4((x),(v)) \
(int)InterlockedAdd((LONG volatile*)(x),(LONG)(v))

#define __sync_fetch_and_sub_4((x),(v)) \
(int)InterlockedAdd((LONG volatile*)(x),(LONG)(-(v)))

/* uncomment if (x % 8 == 0) */
/*
#define __sync_fetch_and_add_8((x),(v)) \
(int)InterlockedAdd64((LONGLONG volatile*)(x),(LONGLONG)(v))

#define __sync_fetch_and_sub_8((x),(v)) \
(int)InterlockedAdd64((LONGLONG volatile*)(x),(LONGLONG)(-(v)))
*/

#endif

Sturla

Stefan Behnel

unread,

Feb 18, 2013, 9:17:59 AM2/18/13

to cython...@googlegroups.com

Hi Sturla,

Sturla Molden, 18.02.2013 15:09:

> On 18.02.2013 14:34, Sturla Molden wrote:
>
>> This will direct missing GNU intrinsics in bad MinGWs to Windows API
>> atomics, or do nothing if MinGW can use GNU intrinsics.
>
> Actually I think we should use InterlockedAdd to make the macros not mess
> with v:

Why don't you set up a pull request with your changes? That would allow us
to discuss them directly on github.

Stefan

Sturla Molden

unread,

Feb 18, 2013, 9:41:38 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 15:17, Stefan Behnel wrote:

> Why don't you set up a pull request with your changes?

Ok, will do.

I just don't want memoryviews to use Python locks on MinGW because most
of us have well-behaved GCCs on Windows.

I am starting to think these macros might not work, if GCC emits the
symbol __sync_fetch_and_add_4 after the preprocessor step. Which I think
will happen because the preprocessor emits the symbol that is a GNU
intrinsic. In which case we would need to use two functions instead.
Then we get something like:

#ifdef WIN32
#include <Windows.h>

__inline__ __pyx_atomic_int_type
__sync_fetch_and_add_4(__pyx_atomic_int_type *x,
__pyx_atomic_int_type v)
__attribute__((always_inline))
{
return (__pyx_atomic_int_type)
InterlockedAdd((LONG volatile*)x,(LONG)v);
}

__inline__ __pyx_atomic_int_type
__sync_fetch_and_sub_4(__pyx_atomic_int_type *x,
__pyx_atomic_int_type v)
__attribute__((always_inline))
{
return (__pyx_atomic_int_type)
InterlockedAdd((LONG volatile*)x,(LONG)-v);
}

#endif

Sturla

Sturla Molden

unread,

Feb 18, 2013, 11:20:45 AM2/18/13

to cython...@googlegroups.com

On 18.02.2013 15:17, Stefan Behnel wrote:
>

> Why don't you set up a pull request with your changes? That would allow us
> to discuss them directly on github.

https://github.com/cython/cython/pull/185

Sturla

JF Gallant

unread,

Feb 18, 2013, 3:09:30 PM2/18/13

to cython...@googlegroups.com

If I understand you already working on the problem of memview with MinGW and python 3.3 ?
I don't find time to add it in the bug tracker and missing information because I not be able to keep all the log of run_test

Sturla Molden

unread,

Feb 19, 2013, 6:31:27 AM2/19/13

to cython...@googlegroups.com

The error you got has nothing to do with Python 3.3, Cython, or even with MinGW.

It is your particular MinGW compiler that refuse to use atomic builtins. It is probably a bug in your MinGW version, or it might be the result of a bad build confuguration for GCC. I'd recomment you get a different MinGW compiler. I use gcc 4.7.1 from TDM-GCC. It works as it should.

The current fix from Mark is breaking the performance of memoryview slicing on MinGW. So I was posting a PR that undos it, and fixes your problem in a less evil manner. Since you have a broken MinGW, it would be nice you could test the fix I have made. It is at

https://github.com/sturlamolden/cython

I'd also love to get my hands on your bad MinGW compiler, for testing purposes, if you would care to share the binaries with me (e.g. on Dropbox). I'd like to investigate what build flags you used, because this problem might happen on other platforms too, e.g. Mac and Linux. And of course it is the perfect lab tool for testing the fallback-code we need when atomic builtins aren't used.

Sturla

JF Gallant

unread,

Feb 19, 2013, 10:07:57 AM2/19/13

to cython...@googlegroups.com

Okay , that's weird because I have the same problem on two computer:
-ThinkStation S20 with i7 processor , 24gig of ram , Windows7 64bit professional , fresh install of python 3.3 last week, fresh install of Cython 1.8 (last week), fresh install of MinGW 4.7.2 with the get-installer and I click on "download last catalogue"
-Acer AspireOne , 1gig of ram , Windows XP 32bit Home edition , fresh install of python 3.3 this weekend, fresh install of Cython 1.8 (this weekend), fresh install of MinGW 4.7.2 with the get-installer and I click on "download last catalogue"

All install of my MinGW are done with the get-installer so it's downloaded from the server.

Sturla Molden

unread,

Feb 19, 2013, 5:08:41 PM2/19/13

to cython...@googlegroups.com

Den 19. feb. 2013 kl. 16:07 skrev JF Gallant <pyro...@gmail.com>:

Okay , that's weird because I have the same problem on two computer:
-ThinkStation S20 with i7 processor , 24gig of ram , Windows7 64bit professional , fresh install of python 3.3 last week, fresh install of Cython 1.8 (last week), fresh install of MinGW 4.7.2 with the get-installer and I click on "download last catalogue"
-Acer AspireOne , 1gig of ram , Windows XP 32bit Home edition , fresh install of python 3.3 this weekend, fresh install of Cython 1.8 (this weekend), fresh install of MinGW 4.7.2 with the get-installer and I click on "download last catalogue"

All install of my MinGW are done with the get-installer so it's downloaded from the server.

That is very strange. I currently use TDM-GCC, not the "official" MinGW distro. Which one do you have?

However, never mind why this happens. We know what it is, and we have two fixes for it. I believe mine is the better as it will still use atomics. Please try this patched version of Cython:

https://github.com/sturlamolden/cython

(I have posted a PR to Cython master for this fix.)

Sturla

JF Gallant

unread,

Feb 19, 2013, 5:23:11 PM2/19/13

to cython...@googlegroups.com

I use the link provided on MinWG.org.
here is the link of the binary : http://sourceforge.net/projects/mingw/files/Installer/mingw-get-inst/mingw-get-inst-20120426/

And like a said , I check "use the latest catalogue" or something like that in the first option when you start the installer.

Can you provide me a short pyx code to make the test ( without numpy, I can't use it and I don't have it installed ) , just to be sure we test the same things and the problem don't come from me.

thanks a lot !

Sturla Molden

unread,

Feb 19, 2013, 7:30:52 PM2/19/13

to cython...@googlegroups.com, cython...@googlegroups.com

Den 19. feb. 2013 kl. 23:23 skrev JF Gallant <pyro...@gmail.com>:

>
> I use the link provided on MinWG.org.
> here is the link of the binary : http://sourceforge.net/projects/mingw/files/Installer/mingw-get-inst/mingw-get-inst-20120426/
>
>

And now I know why this happens :-)

MinGW for that source is configured to emit opcode for i386 by default. But GCC requires opcode for i486 (or later) for the atomic builtins. You need to use -march=i486 or something better:
-march=i686, -march=corei7 or -march=native.

The same error should happen on Linux too if we ask GCC to generate opcode for i386.

Sturla

JF Gallant

unread,

Feb 19, 2013, 7:38:13 PM2/19/13

to cython...@googlegroups.com

good news !

But ... where I need to change this ? Some setting to change ?
Just explain me step by step and I can try it on my system :)

thanks a lot for your time !

Sturla Molden

unread,

Feb 19, 2013, 8:14:58 PM2/19/13

to cython...@googlegroups.com

Something like this?

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

module = Extension("example",

sources=["example.pyx"],

extra_compile_flags=['-march=i686']

)

setup(

cmdclass = {'build_ext': build_ext},

ext_modules = [module]

)

JF Gallant

unread,

Feb 19, 2013, 8:40:51 PM2/19/13

to cython...@googlegroups.com

yes somethings like this ;)

I suppose if I try it on my AspireOne notebook , I replace i686 for i486 ? I'm in the bus to back home now , not near of my i7.

JF Gallant

unread,

Feb 19, 2013, 9:04:04 PM2/19/13

to cython...@googlegroups.com

The compile don't make error anymore!
Memview seem to working now.
I need to test in more when I have time

JF Gallant

unread,

Feb 19, 2013, 9:20:56 PM2/19/13

to cython...@googlegroups.com

It's working !
No more headhash about slicing array!
But how much slower memview are in average context of use ? I read 10% somewhere but I don't know if is about memview.

Sturla Molden

unread,

Feb 20, 2013, 4:45:37 AM2/20/13

to cython...@googlegroups.com

Den 20. feb. 2013 kl. 02:40 skrev JF Gallant <pyro...@gmail.com>:

> yes somethings like this ;)
>
> I suppose if I try it on my AspireOne notebook , I replace i686 for i486 ? I'm in the bus to back home now , not near of my i7.
>

I think you can use i686 there as well, it's the smallest common denominator of x86 processors today. But with

-march=native

gcc will compile for the CPU you actually have.

Sturla

Sturla Molden

unread,

Feb 20, 2013, 5:14:38 AM2/20/13

to cython...@googlegroups.com

10% is about what I have seen when I have cared to measure. I am writing a benchmark Cython's typed memoryviews against C and Fortran 95 for various scientific programming issues (based on scimark). But my time to spend on it is limited.

It also depends on what you mean by "average usecase", of course. I will affect hardcore number crunching (which is what I have measured) and i/o intensive applications very differently. If >90% of the time is spent waiting for i/o (e.g. receiving data from an external web server), even pure Python will not be more than 10% slower than C.

Sturla

Sturla Molden

unread,

Feb 20, 2013, 5:36:10 AM2/20/13

to cython...@googlegroups.com, markflo...@gmail.com

Den 18. feb. 2013 kl. 12:58 skrev mark florisson <markflo...@gmail.com>:
>
> Yes, that's due to the "fix" I pushed.

We now know it was a i386 limitation and not a MinGW specific problem. The error will happen on any GCC if the target architecture is i386.

Thus, I think the fallback code should be used if __i386__ is defined when compiling with GCC.

Those who compile for i386 probably don't care so much about fast memoryviews anyway ;-)

Sturla

JF Gallant

unread,

Feb 20, 2013, 5:57:46 AM2/20/13

to cython...@googlegroups.com

It's weird in 2013 than i386 is the default setting in MinGW. I want to use memview in my "basic" kdtree implementation to slice the array. I have the choice to use memview or to try to slice without it. If I have time I can try to doing both and see the difference of speed.

Sturla Molden

unread,

Feb 20, 2013, 6:04:28 AM2/20/13

to cython...@googlegroups.com

Den 20. feb. 2013 kl. 11:57 skrev JF Gallant <pyro...@gmail.com>:

> It's weird in 2013 than i386 is the default setting in MinGW. I want to use memview in my "basic" kdtree implementation to slice the array. I have the choice to use memview or to try to slice without it. If I have time I can try to doing both and see the difference of speed.
>

I know, it is a CPU from the 1980s... Even i686 (aka "Pentium Pro") is close to 20 years old now.

It is safe to assume i686 when compiling for x86.

Sturla

Sturla Molden

unread,

Feb 20, 2013, 6:09:18 AM2/20/13

to cython...@googlegroups.com, cython...@googlegroups.com

Den 20. feb. 2013 kl. 11:57 skrev JF Gallant <pyro...@gmail.com>:

> It's weird in 2013 than i386 is the default setting in MinGW. I want to use memview in my "basic" kdtree implementation to slice the array. I have the choice to use memview or to try to slice without it. If I have time I can try to doing both and see the difference of speed.
>
>

You probably also want the compiler flags -O3 and -ffast-math. Just add them to the extra_compile_flags list.

Sturla

JF Gallant

unread,

Feb 20, 2013, 6:35:34 AM2/20/13

to cython...@googlegroups.com

ok. What this doing exactly ? Optimized code?
Oh and I forget , "..._flags" don't work , it's probably just a mistake. "..._args" working. For any body else reading the post.

JF Gallant

unread,

Feb 21, 2013, 3:18:37 PM2/21/13

to cython...@googlegroups.com

On Wednesday, February 20, 2013 6:35:34 AM UTC-5, JF Gallant wrote:

ok. What this doing exactly ? Optimized code?
Oh and I forget , "..._flags" don't work , it's probably just a mistake. "..._args" working. For any body else reading the post.

oops , qsort don't support memoryview type :(

Sturla Molden

unread,

Feb 22, 2013, 7:05:48 AM2/22/13

to cython...@googlegroups.com

On 21.02.2013 21:18, JF Gallant wrote:

> oops , qsort don't support memoryview type :(

Just ensure the memoryview is C contiuguous, and pass a pointer to the
first element.

cdef double somearray[100]

cdef double[::1] array_view = somearray

cdef double[::1] array_slice = array_view[50:76]

cdef double *slice_pointer = &array_slice[0]

One other thing though:

Since you need only partial sorting, you will be better off using
quickselect (average O(n)) than quicksort (average O(n log n)). You
don't need all the elements sorted, you just need them split in two
groups (smaller or bigger than the median).

Then you e.g. can use something like this? The median function will
leave the elements partially sorted.

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int select(double[::1] a, Py_ssize_t k) except -1:
cdef Py_ssize_t i, j, l, n, r
cdef double x, tmp
n = a.shape[0]
l = 0
r = n - 1
with nogil:
while l < r:
x = a[k]
i = l
j = r
while 1:
while a[i] < x: i += 1
while x < a[j]: j -= 1
if i <= j:
tmp = a[i]
a[i] = a[j]
a[j] = tmp
i += 1
j -= 1
if i > j: break
if j < k: l = i
if k < i: r = j
return 0

@cython.boundscheck(False)
@cython.wraparound(False)
cdef double maxvalue(double[::1] x):
cdef Py_ssize_t i
cdef double tmp
tmp = x[0]
for i in range(1,x.shape[0]):
if x[i] > tmp:
tmp = x[i]
return tmp

def double median(double[::1] x):
cdef Py_ssize_t n = x.shape[0]
cdef Py_ssize_t k
cdef double s

if n > 3:
k = n >> 1
select(x, k)
if n & 1:
return x[k]
else:
return 0.5*(x[k]+maxvalue(x[:k]))

elif n == 0:
raise ValueError, 'Length of s must be at least 1'

elif n == 2:

if x[0] > x[1]:
tmp = x[0]
x[0] = x[1]
x[1] = tmp

return 0.5*(x[0]+x[1])

else: # n == 3

if x[0] > x[1]:
tmp = x[0]
x[0] = x[1]
x[1] = tmp

if x[1] > x[2]:
tmp = x[1]
x[1] = x[2]
x[2] = tmp

if x[0] > x[1]:
tmp = x[0]
x[0] = x[1]
x[1] = tmp

return x[1]

Sturla

Sturla Molden

unread,

Feb 22, 2013, 7:11:01 AM2/22/13

to cython...@googlegroups.com

On 22.02.2013 13:05, Sturla Molden wrote:

> Then you e.g. can use something like this? The median function will
> leave the elements partially sorted.

It looks like the indentation in the median function was messed up when
posting :(

Sturla

Reply all

Reply to author

Forward