How to? Shuffling a vector of integers in Cython without requiring the GIL

101 views
Skip to first unread message

Adam Li

unread,
May 13, 2022, 6:53:56 PM5/13/22
to cython...@googlegroups.com
Hi,

I was wondering if it is possible to use C++ shuffling functions in Cython to shuffle a vector of integers. I have two questions related:
  1. How do I do the shuffling?
  2. How can I pass in a random seed that is from Cython/Python? (i.e. "random_shuffle(start, end, random_state)"
For example, if I have the following

from libcpp.algorithm cimport random_shuffle
...
# construct an array to sample from mTry x n_features set of indices
cdef SIZE_t i
indices_to_sample = vector[SIZE_t](N * M)
for i in range(0, indices_to_sample.size()):
indices_to_sample.push_back(i)

# attempt at random shuffling
random_shuffle(indices_to_sample.begin(), indices_to_sample.end())

I get an error that:

  Error compiling Cython file:
  ------------------------------------------------------------
  ...

          cdef int i, feat_i, proj_i, rand_vec_index
          cdef DTYPE_t weight

          # shuffle indices to sample
          random_shuffle(self.indices_to_sample.begin(), self.indices_to_sample.end())
                                                                                  ^
  ------------------------------------------------------------

  sklearn/tree/_oblique_splitter.pyx:225:81: Converting to Python object not allowed without gil

It seems it does not work? Moreover, I am unsure how to pass in a random seed if a user defines it at the level of Numpy/Python...

Thanks!

--
Best Regards,

Adam Li (he/him), PhD in Biomedical Engineering 
Postdoctoral Researcher at Columbia University
Causal AI Lab

da-woods

unread,
May 14, 2022, 2:59:03 AM5/14/22
to cython...@googlegroups.com
random_shuffle doesn't look to be part of our libcpp.algorithm.pxd file. You might have to write your own wrapper. I suspect this is where your error comes from because it doesn't know what random_shuffle is so assumes it's a Python function. (Look at the first error message - that probably tells you about it).

`size_t` is usually lower-case. Not sure if the upper-case equivalent exists.

The seed functions are usually part of `random_state`. I think they just take integers or strings so can be easily passed from Python.

What you're doing is broadly right though.
--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/CAJ1JY4pLuULf6nNDhAcrVH-h8AVViT%3DpZJV3wZ4_WUHo9gFKEA%40mail.gmail.com.


Stefan Behnel

unread,
May 18, 2022, 1:09:00 AM5/18/22
to cython...@googlegroups.com
Adam Li schrieb am 13.05.22 um 22:14:
> from libcpp.algorithm cimport random_shuffle
> ...
>
> # construct an array to sample from mTry x n_features set of indices
> cdef SIZE_t i
> indices_to_sample = vector[SIZE_t](N * M)

Is this inside of a function or module global code? Note that there is no
type inference outside of functions, so you need to declare the
"indices_of_sample" variable in that case to make sure it gets a C++ type
(and is not visible from Python code outside) and not a Python object type
(with normal Python visibility, that's the main reason for this difference).

Stefan

Adam Li

unread,
Apr 16, 2023, 12:54:40 AM4/16/23
to cython-users
Hi,

A short follow-up on this topic cuz I'm comparing different approaches now. Say I want to initialize a vector and use efficient C++ std library function for shuffling its elements. How would I go about wrapping it?


Doesn't seem to specify how to wrap C++ functions in the standard library that are not vendored by Cython, so I'm not exactly sure how to implement this approach.

Reply all
Reply to author
Forward
0 new messages