Cython: passing a Python data structure as a void pointer

1,797 views
Skip to first unread message

Jernej Azarija

unread,
Dec 26, 2014, 8:06:14 AM12/26/14
to sage-...@googlegroups.com
Dear sage-devel,

I have a question related to void pointers in Cython. I am wrapping a C++ library that has a function that accepts a function pointer and each time it processes something it calls the passed function. The function being passed is defined as

void hook(void *user_param , unsigned int n, const unsigned int *aut).

n, aut is the data computed by the library while user_param is anything the user wishes to pass to the function being called.

Now, in my scenario it would be really nice if user_param could point to a tuple containing a dict and a list. Hence my question is:

Is there a way in Cython to obtain the address of a Python data structure  and later on for this address to be casted to the corresponding Python type?

The alternative that I am currently using is to ignore user_param completely and keep some global data structures to store the results being passed to hook. I am not sure if this is less ugly than passing a void pointer and casting or not.

Best,

Jernej

Nathann Cohen

unread,
Dec 26, 2014, 8:19:43 AM12/26/14
to sage-...@googlegroups.com
Yoooooooooo Jernej !

Not sure I understood what you wanted, but in graphs/base/static_sparse_graph.pyx there is a Cython attribute which points toward a Python list.

That's because we have to store the labels of each edge, and because that label can be any python object.

If you look for instances of "edge_labels" in that file, you will see how it is defined and how it is cast into a list whenever needed.

Nathann

P.S.: And then you can grep Sage's code for things like "cpython.Py_XDECREF". That will probably be the sign of something similar, though it may be less readable ;-)

Jernej Azarija

unread,
Dec 26, 2014, 8:54:32 AM12/26/14
to Sage devel, Nathann Cohen
Hey Nathann,

yes yes, that's precisely what I was looking for:

cpdef void *user_param =<PyObject *> data
......
data = <object> <PyObject *> user_param

Thanks for the help

Jernej

--
You received this message because you are subscribed to a topic in the Google Groups "sage-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sage-devel/xMSPed6C0ds/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Nathann Cohen

unread,
Dec 26, 2014, 9:09:30 AM12/26/14
to Jernej Azarija, Sage devel
> yes yes, that's precisely what I was looking for:
>
> cpdef void *user_param =<PyObject *> data
> ......
> data = <object> <PyObject *> user_param
>
> Thanks for the help

Don't forget the decref/incref stuff ! That's for Python's garbage
collector. Otherwise it will not know that the object is being used
and you will be in trouble ! [1]

Good luck,

Nathann

[1] And I am afraid that you will have to google that if you want to
know more, because that's all I know about it :-P

Vincent Delecroix

unread,
Dec 26, 2014, 10:54:41 AM12/26/14
to sage-...@googlegroups.com
Hello,

Note that there is the googlegroups cython-users which might be a
better place for cython related questions.

Best
Vincent

2014-12-26 15:09 UTC+01:00, Nathann Cohen <nathan...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Volker Braun

unread,
Dec 26, 2014, 12:04:08 PM12/26/14
to sage-...@googlegroups.com, azi.s...@gmail.com
You shouldn't tinker with the Python reference counting system if it can be avoided. Just make sure that the Python object that you want to pass to your function is kept alive for as long as you are going to use the C pointer to it.E.g. store it in an attribute of the Python object that is calling the C function etc.

Simon King

unread,
Dec 27, 2014, 4:12:34 AM12/27/14
to sage-...@googlegroups.com
Hi!

On 2014-12-26, Nathann Cohen <nathan...@gmail.com> wrote:
> Don't forget the decref/incref stuff ! That's for Python's garbage
> collector. Otherwise it will not know that the object is being used
> and you will be in trouble !

Another hint: If P is a pointer to PyObject, then
<object>P
does already increase the reference count. So, no need to call Py(X)INCREF
in that case.

Cheers,
Simon

Nathann Cohen

unread,
Dec 27, 2014, 4:20:06 AM12/27/14
to Sage devel
Hello !

> Another hint: If P is a pointer to PyObject, then
> <object>P
> does already increase the reference count. So, no need to call Py(X)INCREF
> in that case.

Oh, good to know. But then, would "print (<list> my_list_pointer)[0]"
do a incref followed by a decref ?

Thanks,

Nathann

Simon King

unread,
Dec 27, 2014, 4:41:22 AM12/27/14
to sage-...@googlegroups.com
Hi Nathann,

On 2014-12-27, Nathann Cohen <nathan...@gmail.com> wrote:
> Oh, good to know. But then, would "print (<list> my_list_pointer)[0]"
> do a incref followed by a decref ?

If I understand correctly, as soon as you do "O=<object>your_pointer",
Python will take care of the object that has just popped up. Hence, when
you do "del O" or when you leave the function inside which O is a local
variable, then the reference count will be decreased. But I am not sure
if (<list>my_list_pointer)[0] will work, it could be that the cython
compiler complains about accessing an item from a temporary variable. In
the situation above, it would probably be more reasonable to do
something like
<object>(PyList_GET_ITEM(my_list_pointer, 0))

You can find loads of examples by "grep GETITEM -r src/sage|grep .pyx:".

Best regargds,
Simon

Volker Braun

unread,
Dec 27, 2014, 7:35:33 AM12/27/14
to sage-...@googlegroups.com
On Saturday, December 27, 2014 10:41:22 AM UTC+1, Simon King wrote:
[...] But I am not sure
if (<list>my_list_pointer)[0] will work, it could be that the cython
compiler complains about accessing an item from a temporary variable.

There is no temporary involved, <list>my_list_pointer is just reinterpreting (casting) the existing C pointer to a Python object as a Python list (and adjusting the refcount of the original object, of course). If you had declared "cdef list my_list_pointer" instead of PyObject* then you wouldn't even have to do that.

 
[...] it would probably be more reasonable to do
something like
  <object>(PyList_GET_ITEM(my_list_pointer, 0))

IMHO you should strive to never use direct CPython API in Cython programs. Almost all uses in Sage are just vestiges of older times where Cython wasn't able to do something. But Cython is much more capable nowadays, use it to write clear and expressive programs.


Jeroen Demeyer

unread,
Dec 28, 2014, 12:30:22 PM12/28/14
to sage-...@googlegroups.com
On 2014-12-27 13:35, Volker Braun wrote:
> IMHO you should strive to never use direct CPython API in Cython
> programs. Almost all uses in Sage are just vestiges of older times where
> Cython wasn't able to do something. But Cython is much more capable
> nowadays, use it to write clear and expressive programs.
+1

The idea of Cython is really that you shouldn't need to know the C API
to write good Cython code. If Cython knows the types of the objects, it
will generate efficient code.

Simon King

unread,
Dec 28, 2014, 1:44:51 PM12/28/14
to sage-...@googlegroups.com
Hi!

On 2014-12-28, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> The idea of Cython is really that you shouldn't need to know the C API
> to write good Cython code. If Cython knows the types of the objects, it
> will generate efficient code.

Then I wonder what to do at #17435.

Each term of an element of a path algebra can be encoded in a C struct.
The terms form a list (but I don't need direct access to the n-th item,
all operations either just look at the head of the list or go through all
items of the list). So, there are (at least) two possibilities:
1. Wrap the C struct into a cdef class, and put the items (i.e.,
instances of the class) into a cython list (or tuple).
2. Do not wrap the C struct, but implement the list as a linked list.

Am I right that the second approach would perform faster than the first,
since one saves the time to create instances of the cdef class, and
would also save memory? Or would you still recommend to take the first
approach?

At #17435, I took the second approach. One complication: Each term of a
path algebra element involves a coefficient, which is a Python object.
In the C struct, it appears as a pointer to PyObject, and I need to take
care of the reference count.

Anyway, prior to #17435, path algebra elements were implemented in Python
based on CombinatorialFreeModuleElement, the terms being stored in a
dictionary. The second approach above gives, of course, much faster
arithmetics than the Python implementation. But the first approach would
be faster than the Python implementation, too.

Would you say that taking the second appraoch was a bad idea, and I
should change everything to the first (which would involve a lot of
work, resulting in easier to read code, but I would only do it if the
performance would be competitive!).

Best regards,
Simon


Nils Bruin

unread,
Dec 28, 2014, 2:19:01 PM12/28/14
to sage-...@googlegroups.com
On Sunday, December 28, 2014 10:44:51 AM UTC-8, Simon King wrote:
Hi!

On 2014-12-28, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> The idea of Cython is really that you shouldn't need to know the C API
> to write good Cython code. If Cython knows the types of the objects, it
> will generate efficient code.

Then I wonder what to do at #17435.

Each term of an element of a path algebra can be encoded in a C struct.
The terms form a list (but I don't need direct access to the n-th item,
all operations either just look at the head of the list or go through all
items of the list). So, there are (at least) two possibilities:
1. Wrap the C struct into a cdef class, and put the items (i.e.,
instances of the class) into a cython list (or tuple).
2. Do not wrap the C struct, but implement the list as a linked list.

Am I right that the second approach would perform faster than the first,
since one saves the time to create instances of the cdef class, and
would also save memory?

That's not immediately clear. An array of pointers is a little more compact than a linked list, and allocating an array at once should be a little more efficient than allocating multiple nodes of a linked list. The big advantage of a linked list is that insertion/deletion is fast. If you don't do that then I think linked lists are not particularly efficient.

Instantiating a cdef class might have some more overhead compared to simply mallocing a struct, but that might not be huge. Especially if you immediately extend PyList via "cdef class elem(list):" (or perhaps PyTuple) then I'd expect the overhead to be not so big (since python needs tuple/list instantiation to be lightning fast).

Would you say that taking the second appraoch was a bad idea, and I
should change everything to the first (which would involve a lot of
work, resulting in easier to read code, but I would only do it if the
performance would be competitive!).

I think it's worth an experiment. I think it would be a significant benefit if you can avoid dealing with memory management
 
Best regards,
Simon


Simon King

unread,
Dec 28, 2014, 2:44:58 PM12/28/14
to sage-...@googlegroups.com
Hi Nils,

On 2014-12-28, Nils Bruin <nbr...@sfu.ca> wrote:
> That's not immediately clear. An array of pointers is a little more compact
> than a linked list, and allocating an array at once should be a little more
> efficient than allocating multiple nodes of a linked list.

An array of fixed size is no option, because...

> The big
> advantage of a linked list is that insertion/deletion is fast. If you don't
> do that then I think linked lists are not particularly efficient.

... I frequently insert into (and delete items from) the list.

> Instantiating a cdef class might have some more overhead compared to simply
> mallocing a struct, but that might not be huge. Especially if you
> immediately extend PyList via "cdef class elem(list):" (or perhaps PyTuple)
> then I'd expect the overhead to be not so big (since python needs
> tuple/list instantiation to be lightning fast).

I was talking about instantiating the *items* of the list, not the list
itself.

Best regards,
Simon

Volker Braun

unread,
Dec 28, 2014, 3:49:26 PM12/28/14
to sage-...@googlegroups.com
When starting from scratch I would go with the first option. Cleaner code, easier to write and debug, and if the coefficient is already a Python object then the overhead is already bounded by a factor of 2.

If that is not fast enough then I would implement the polynomials as C++ templates and specialize to PyObject coefficients as needed. I would use std::map<exponent, coefficient> (that is, a binary tree) instead of std::list<std::pair<exponent, coefficient>> as underlying data structure though. Even if you don't want/need associative containers, why linked lists? Do you ever insert in the middle? Presumably polynomials are immutable => arrays or std::vector 

Though given that your implementation already exists and (presumably) works and is well-debugged I wouldn't change it now. Finish it, and if you run into any performance bottleneck you can explore other options.

 

Simon King

unread,
Dec 28, 2014, 8:27:49 PM12/28/14
to sage-...@googlegroups.com
Hi Volker,

On 2014-12-28, Volker Braun <vbrau...@gmail.com> wrote:
> When starting from scratch I would go with the first option. Cleaner code,
> easier to write and debug, and if the coefficient is already a Python
> object then the overhead is already bounded by a factor of 2.

In fact, a year or two ago, I had experimental code where I did follow
the aproach to use Python objects (of course cythoned). It tought me much
(about theory), but it wasn't really competitive.

> I would use
> std::map<exponent, coefficient> (that is, a binary tree) instead of
> std::list<std::pair<exponent, coefficient>> as underlying data structure
> though. Even if you don't want/need associative containers, why linked
> lists?

Because this is about computing Gröbner bases. So, there is a monomial
ordering by which the terms of a polynomial are linearly ordered. That's
at #17435. In the yet unpublished branch implementing Gröbner basis
computations, I am using the geobucket data structure, which is a bit
more complicated than just a linear list, but is better for the typical
operations occurring when computing Gröbner bases.

> Do you ever insert in the middle?

Yes, as follows:

> Presumably polynomials are immutable => arrays or std::vector

Polynomials get reduced by generators of ideals, so, they aren't
immutable, as the lead term will be replaced by other terms, that are
inserted in the middle.

> Though given that your implementation already exists and (presumably) works
> and is well-debugged I wouldn't change it now. Finish it, and if you run
> into any performance bottleneck you can explore other options.

Yep, that was my plan. First, I had the above-mentioned experimental
code, based on the experiments I came up with a paper describing the
theory, and now I try to get a competitive implementation.

Best regards,
Simon


Volker Braun

unread,
Dec 29, 2014, 8:46:14 AM12/29/14
to sage-...@googlegroups.com
On Monday, December 29, 2014 2:27:49 AM UTC+1, Simon King wrote:
In fact, a year or two ago, I had experimental code where I did follow
the aproach to use Python objects (of course cythoned). It tought me much
(about theory), but it wasn't really competitive.

Competitive relative to what? If you are just doubling the number of Python objects then how is that not just a factor of 2 in the worst case? 

Because this is about computing Gröbner bases.

Hence geobuckets where each bucket is typically implemented as memory-contiguous array, no?
 
Polynomials get reduced by generators of ideals, so, they aren't
immutable, as the lead term will be replaced by other terms, that are
inserted in the middle. 

The standard lore is that you need memory locality, isn't it? Hence arrays instead of linked lists. Of course there is a certain tradeoff between saved ram vs. more cache misses.  

Simon King

unread,
Dec 29, 2014, 11:08:23 AM12/29/14
to sage-...@googlegroups.com
Hi Volker,

first of all, to the OP, sorry that the original topic of the thread has changed.

On 2014-12-29, Volker Braun <vbrau...@gmail.com> wrote:
> On Monday, December 29, 2014 2:27:49 AM UTC+1, Simon King wrote:
>>
>> In fact, a year or two ago, I had experimental code where I did follow
>> the aproach to use Python objects (of course cythoned). It tought me much
>> (about theory), but it wasn't really competitive.
>>
>
> Competitive relative to what?

1.
Task: Computation of Gröbner bases for modules over path algebras.
Competitor: Letterplace (distributed with Singular), which deals with homogeneous
two-sided ideals in free algebras.
Aim: Be faster. Include the non-homogeneous and the one-sided case (of course,
finite Gröbner bases will not always exist). Work over path algebras, which are
more general than free algebras. Avoid a memory leak that appears in Letterplace
(or its Sage wrapper, not sure where the problem lies).

2.
Task: Computation of minimal generating sets for modules over basic algebras,
with an application to the computation of minimal projective resolutions for
basic algebras.
Competitor: David Green's C programs for the computation of minimal projective
resolutions for modular group algebras of prime power groups (distributed in
my optional group cohomology spkg).
Aim: Be faster. Work over basic algebras, which are more general than
modular group algebras of prime power groups. Compute Loewy layers for
modules over basic algebras, which is more general than computing a minimal
generating set.

3.
Task: Compute Ext algebras of basic algebras.
Competitor: Robert Michael Pawloski's programs.
Aim: Be faster and deal with bigger examples than what he describes in
his PhD thesis.

4.
"Holy grail": Compute the modular cohomology rings of Mathieu group M_{24}
(that's too big for the current group cohomology spkg).

> If you are just doubling the number of Python
> objects then how is that not just a factor of 2 in the worst case?

What do you mean by doubling the number of Python objects? Do you mean:
"The coefficients are Python objects anyway, so, what's the problem if
the terms become Python objects, too?"

> Hence geobuckets where each bucket is typically implemented as
> memory-contiguous array, no?

My reference is "The Geobucket Data Structure for Polynomials" of Thomas
Yan. His buckets are decreasingly sorted lists of terms. He doesn't
mention arrays. Do you have a different reference in mind? Or do you
think I misunderstood what he means (I thought "sorted list" would be
implemented as a linked list, not as an array)?

Best regards,
Simon


Volker Braun

unread,
Dec 29, 2014, 12:14:36 PM12/29/14
to sage-...@googlegroups.com
On Monday, December 29, 2014 5:08:23 PM UTC+1, Simon King wrote:
What do you mean by doubling the number of Python objects? Do you mean:
"The coefficients are Python objects anyway, so, what's the problem if
the terms become Python objects, too?"
 
Yes, that is what I meant.

My reference is "The Geobucket Data Structure for Polynomials" of Thomas
Yan. His buckets are decreasingly sorted lists of terms. He doesn't
mention arrays. Do you have a different reference in mind? Or do you
think I misunderstood what he means (I thought "sorted list" would be
implemented as a linked list, not as an array)?

By "list polynomials", he means arrays. Footnote 6 on page 3.

Simon King

unread,
Dec 29, 2014, 1:47:18 PM12/29/14
to sage-...@googlegroups.com
Hi Volker,

On 2014-12-29, Volker Braun <vbrau...@gmail.com> wrote:
>> My reference is "The Geobucket Data Structure for Polynomials" of Thomas
>> Yan. His buckets are decreasingly sorted lists of terms. He doesn't
>> mention arrays. Do you have a different reference in mind? Or do you
>> think I misunderstood what he means (I thought "sorted list" would be
>> implemented as a linked list, not as an array)?
>>
>
> By "list polynomials", he means arrays. Footnote 6 on page 3.

OMG! Actually I missed that footnote.

Hmmm. That probably means I should rewrite my current F5 code, which
isn't on trac yet. I suppose I should first read more on typed memoryviews,
right? To me, it looks like the right Cython tool to create arrays of
objects (assuming that I will also turn my C struct terms into objects).

About #17435 (Cythonise path algebra elements): Would you think that one
should use arrays there? In contrast to the geobuckets, there is no a-priori
bound on the number of terms for the elements of a path algebra. On the
other hand, if one says that elements are immutable, then *each* element
has a fixed number of elements and can thus be an array.

Best regards,
Simon


Volker Braun

unread,
Dec 29, 2014, 3:43:55 PM12/29/14
to sage-...@googlegroups.com
On Monday, December 29, 2014 7:47:18 PM UTC+1, Simon King wrote:
[...] I suppose I should first read more on typed memoryviews,
right? To me, it looks like the right Cython tool to create arrays of
objects (assuming that I will also turn my C struct terms into objects).

I haven't used them, but to me the memoryviews are mainly about accessing existing arrays. It doesn't mean that you have to create the array in a particular way.
 
About #17435 (Cythonise path algebra elements): Would you think that one
should use arrays there? In contrast to the geobuckets, there is no a-priori
bound on the number of terms for the elements of a path algebra.

As long as you only insert at the end (append) I don't see why you wouldn't use an array. You can append to std::vector in amortized constant time. 

Dima Pasechnik

unread,
Dec 30, 2014, 7:05:13 AM12/30/14
to sage-...@googlegroups.com
On 2014-12-29, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_7027_1620461608.1419885835815
> Content-Type: multipart/alternative;
> boundary="----=_Part_7028_710634402.1419885835815"
>
> ------=_Part_7028_710634402.1419885835815
> Content-Type: text/plain; charset=UTF-8
No, the whole point of geobuckets is to make addition fast, and
only moderately more memory-hungry, by using
a lazy data structure. Plain arrays (list polynomials in the terminology
of the geobucket paper) will slow addition (and thus GB reductions)
too much, no matter you are commutative or not.

Whether geobuckets are good in the noncommutative case is another
question, of course.

Dima

Simon King

unread,
Dec 30, 2014, 9:23:11 AM12/30/14
to sage-...@googlegroups.com
Hi Dima,

On 2014-12-30, Dima Pasechnik <dim...@gmail.com> wrote:
>> As long as you only insert at the end (append) I don't see why you wouldn't
>> use an array. You can append to std::vector in amortized constant time.
>
> No, the whole point of geobuckets is to make addition fast, and
> only moderately more memory-hungry, by using
> a lazy data structure. Plain arrays (list polynomials in the terminology
> of the geobucket paper) will slow addition (and thus GB reductions)
> too much, no matter you are commutative or not.

Yes, that was my original impression from the paper. That's why it
surprised me when Volker pointed me to the footnote in Yan's paper.

Of course, Volker is not suggesting to drop the geobucket datastructure;
he was merely suggesting that the individual buckets in a geobucket
could be implemented as array (since each is of bounded length) rather
than a linked list. But my impression is that addition (basically
merging of two sorted lists into one) is rather complicated when sorted
lists are implemented as arrays.

However, I haven't tested it. And IF addition is fast with arrays, then
it would make sense to use it. In order to test it, I would need to know
how to get std::vector work with Python objects. I know how to make it
work with PyObject*, but then we'd need to take care of reference
counts---which we'd rather avoid.

> Whether geobuckets are good in the noncommutative case is another
> question, of course.

Why should that be different? The important operations for standard
basis computations are the same: Extraction of the lead term of a
polynomial, multiplication of a polynomial by a term, and addition of
polynomials.

Thank you and Volker for all your comments!
Best regards,
Simon



Volker Braun

unread,
Dec 30, 2014, 12:44:57 PM12/30/14
to sage-...@googlegroups.com
On Tuesday, December 30, 2014 3:23:11 PM UTC+1, Simon King wrote:
[...] But my impression is that addition (basically
merging of two sorted lists into one) is rather complicated when sorted
lists are implemented as arrays.

Its just merge sort, right?
 
[...] get std::vector work with Python objects. I know how to make it
work with PyObject*, but then we'd need to take care of reference
counts---which we'd rather avoid.

I don't think you can avoid it. Increase the ref count whenever you create a new coefficient, and decref when the polynomial is destroyed.

In C++ just wrap the PyObject* in a class that does it automatically (RAII) 

The Python objects of course are not local in memory, they'll be randomly on the heap. If there are only a few (small finite fields) then that might be ok (fit all in the CPU cache), but for large coefficient rings this sounds bad. Hence my suggestion of writing it in C++ templates and then later also add instantiations for coefficients where we already have native C++ implementations. Though perhaps it would be best to first get your Cython implementation going so there is something to compare to...

Simon King

unread,
Dec 30, 2014, 3:01:37 PM12/30/14
to sage-...@googlegroups.com
Hi Volker,

On 2014-12-30, Volker Braun <vbrau...@gmail.com> wrote:
> In C++ just wrap the PyObject* in a class that does it automatically (RAII)

"grep" has told me that src/sage/modular/arithgroup/farey.cpp uses
PyObject*, so, I'll try to learn from it.

> The Python objects of course are not local in memory, they'll be randomly
> on the heap. If there are only a few (small finite fields) then that might
> be ok (fit all in the CPU cache), but for large coefficient rings this
> sounds bad. Hence my suggestion of writing it in C++ templates and then
> later also add instantiations for coefficients where we already have native
> C++ implementations.

Your previous suggestion was to turn terms into objects, and thus when I was
talking about std::vector[PyObject*], the PyObject* were the terms, not
the coefficients.

So, your new suggestion is to use C++ already for the terms? Well, I'll
give it a try.

Can you give me a pointer how I can use in C++ the "ctypedef struct biseq_s"
that is defined in sage/data_structures/bounded_integer_sequences.pxd?

Or does your suggestion also involves rewriting bounded integer sequences
in C++?

> Though perhaps it would be best to first get your
> Cython implementation going so there is something to compare to...

I have an implementation of path algebra elements on trac (not geobuckets,
of course), and basic arithmetic seems to be faster than what we got from
wrapping Letterplace, and seems to have no memory leak. And on my laptop
I also have code for standard basis computations with geobuckets, but
that's in the middle of debugging.

Best regards,
Simon

Volker Braun

unread,
Dec 30, 2014, 4:27:06 PM12/30/14
to sage-...@googlegroups.com
IMHO you need C/C++ if memory locality is important for you because Python/Cython does not give you control over memory location; PyObject memory is always managed by Python and will be randomly on the heap. Whereas in C++ you can allocate objects in a contiguous array, or even by "placement new". The only alternative is to write procedural Cython code that translates into a C program, manually allocate an array of cdef structs etc.


On Tuesday, December 30, 2014 9:01:37 PM UTC+1, Simon King wrote:
So, your new suggestion is to use C++ already for the terms? Well, I'll
give it a try.

In any case thats an optimization that you could also do later when you turn your linked lists into arrays...

Can you give me a pointer how I can use in C++ the "ctypedef struct biseq_s"
that is defined in sage/data_structures/bounded_integer_sequences.pxd?

You just have to make it "ctypedef public struct biseq_s":

 

Dima Pasechnik

unread,
Dec 30, 2014, 5:14:43 PM12/30/14
to sage-...@googlegroups.com
On 2014-12-30, Volker Braun <vbrau...@gmail.com> wrote:
> ------=_Part_8311_2092865628.1419974826414
> Content-Type: multipart/alternative;
> boundary="----=_Part_8312_1157955583.1419974826414"
>
> ------=_Part_8312_1157955583.1419974826414
> Content-Type: text/plain; charset=UTF-8
>
> IMHO you need C/C++ if memory locality is important for you because
> Python/Cython does not give you control over memory location; PyObject
> memory is always managed by Python and will be randomly on the heap.
> Whereas in C++ you can allocate objects in a contiguous array, or even by
> "placement new". The only alternative is to write procedural Cython code
> that translates into a C program, manually allocate an array of cdef
> structs etc.

Isn't it true that numpy arrays integrate well with cython?
At least if all you need is builtin Python object types (e.g. int or
float), or arrays of pointers to Python objects, they may well be used.
http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types

This would mean much less C/C++ (which is IMHO a good thing :))

Dima


Simon King

unread,
Dec 30, 2014, 5:36:41 PM12/30/14
to sage-...@googlegroups.com
Hi Dima,

On 2014-12-30, Dima Pasechnik <dim...@gmail.com> wrote:
> Isn't it true that numpy arrays integrate well with cython?
> At least if all you need is builtin Python object types (e.g. int or
> float), or arrays of pointers to Python objects, they may well be used.
> http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types
>
> This would mean much less C/C++ (which is IMHO a good thing :))

The coefficients may well be accessible to builtin Python types.
However, the monomials (i.e., paths in a digraph) aren't builtin.

Sure, one could work with pointers to Python objects. But wouldn't that
mean to lose data locality?

Best regards,
Simon

Simon King

unread,
Dec 30, 2014, 6:15:08 PM12/30/14
to sage-...@googlegroups.com
Hi Volker,

On 2014-12-30, Volker Braun <vbrau...@gmail.com> wrote:
> Can you give me a pointer how I can use in C++ the "ctypedef struct
>> biseq_s"
>> that is defined in sage/data_structures/bounded_integer_sequences.pxd?
>>
>
> You just have to make it "ctypedef public struct biseq_s":
>
> http://docs.cython.org/src/userguide/external_C_code.html#public-declarations

Thank you for the link.

Indeed, a header file is created in
src/build/cythonized/sage/data_structures/bounded_integer_sequences.h

This is not totally as expected, because the above link says: "If the
Cython module resides within a package, then the name of the .h file
consists of the full dotted name of the module, e.g. a module called
foo.spam would have a header file called foo.spam.h"

So, why is the name not sage.data_structures.bounded_integer_sequences.h?

I wonder how to include the header. It did not end up in local/include/.
Are special instructions needed in src/module_list.py? Either resulting
in the header being put into local/include/, or resulting in the C++
code finding biseq_s and the associated functions?

Best regards,
Simon


Volker Braun

unread,
Dec 30, 2014, 6:32:15 PM12/30/14
to sage-...@googlegroups.com
On Wednesday, December 31, 2014 12:15:08 AM UTC+1, Simon King wrote:
So, why is the name not sage.data_structures.bounded_integer_sequences.h?

Don't know, might be a bug.
 
I wonder how to include the header. It did not end up in local/include/.

Yes, I don't think we have that yet. All generated headers should be installed in a suitable subdirectory of SAGE_LOCAL/include 

Dima Pasechnik

unread,
Dec 31, 2014, 3:25:33 PM12/31/14
to sage-...@googlegroups.com


On Tuesday, 30 December 2014 22:36:41 UTC, Simon King wrote:
Hi Dima,

On 2014-12-30, Dima Pasechnik <dim...@gmail.com> wrote:
> Isn't it true that numpy arrays integrate well with cython?
> At least if all you need is builtin Python object types (e.g. int or
> float), or arrays of pointers to Python objects, they may well be used.
> http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types
>
> This would mean much less C/C++ (which is IMHO a good thing :))

The coefficients may well be accessible to builtin Python types.
However, the monomials (i.e., paths in a digraph) aren't builtin.

perhaps it's faster, development time-wise, to implement paths in a digraph as arrays of integers,
and pack them into numpy arrays, than to be bothered with std::vector.


 

Simon King

unread,
Dec 31, 2014, 8:13:19 PM12/31/14
to sage-...@googlegroups.com
Hi Dima,

On 2014-12-31, Dima Pasechnik <dim...@gmail.com> wrote:
> perhaps it's faster, development time-wise, to implement paths in a
> digraph as arrays of integers,
> and pack them into numpy arrays, than to be bothered with std::vector.

By #16435, paths in digraphs are implemented by sequences of bounded
integers, and these sequences *are* implemented as (MPIR) integers (that
already is in the current beta). The current suggested implementation of
path algebra elements at #17453 allows general coefficients. For my
application, I need coefficients in finite not necessarily prime fields.

Anyway, that's why I asked about the possibility to work with C headers
for bounded integer sequences. It would make it easier to work in C++
rather than Cython.

Can numpy work with finite non-prime fields? Can pairs (coefficient,
MPIR integer) be put into numpy arrays? I guess it is more straight
forward to use a C struct for terms similar to what is defined at
#17453, and have std::vectors of these structs. *IF* linked lists aren't
more suitable for representing polynomials.

Best regards,
Simon


Simon King

unread,
Dec 31, 2014, 8:20:08 PM12/31/14
to sage-...@googlegroups.com
Hi Dima,

On 2014-12-31, Dima Pasechnik <dim...@gmail.com> wrote:
> perhaps it's faster, development time-wise, to implement paths in a
> digraph as arrays of integers,

Or did I misunderstand? Are you saying we should implement bounded integer
sequences (#15820, part of the latest beta) as arrays rather than as
bitsets?

Best regards,
Simon

Reply all
Reply to author
Forward
0 new messages