Thinking about it more... that's probably not quite right - I think the copy is at the end of the `with nogil, parallel` block when it's trying to get one of the allocated pointers out of that block.
In that case, you probably just need to move the contents of that block into a function.
with nogil, parallel():
allocate_scratch_and_loop(in_arr, out_view, ...)
You should still be able to put the `prange` inside that function
so there should be no need to manually chunk it. All you're doing
is moving the scratch allocation to be tightly scoped inside a
function.
--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cython-users/3847BD18-0FEA-40F4-A6F5-03DE2D04B753%40d-woods.co.uk.
I think I've given you bad advice.
It looks like nesting OpenMP prange inside OpenMP parallel only works if they actually are in the same function. The other detail is that `threadid()` doesn't look to be "absolute" but just a "relative" ID for the local block. So what's here is happening:
* with the "scoped" version, `use_threads_if=False` the
`parallel` block doesn't activate. The `prange` block in the
separate function is independent and so does activate.
* with the "scoped" version, `use_threads_if=True` the `parallel`
block spins up many threads. The `prange` block detects that the
program is already running parallely so just becomes a `range`
block which gets executed once for each thread.
* with the "unscoped" version, the `parallel` and `prange` block
are linked at compile-time and so `use_threads_if` controls both
blocks at once.
I don't think this is something we can realistically fix in Cython just because it's largely dictated by OpenMP - at best we might be able to detect and warn about it.
Going back to your original problem, I believe a variation of
"pointer to unique_ptr" does work:
```
cdef unique_ptr[double]* x_smart
with nogil, parallel():
x_smart = new unique_ptr[double](move(make_unique[double](x)))
try:
for i in prange(in_size):
out_view[i] = in_arr[i] +
dereference(dereference(x_smart))
finally:
del x_smart
```
So it's just a standard C pointer that gets declared as firstprivate here.
It's possible that we could improve Cython here: this variable could just be `private` because it isn't used after the parallel block and that'd probably fix the issue. I'm not sure if there's a good reason why we don't try to detect it.
To view this discussion visit https://groups.google.com/d/msgid/cython-users/9f8ac1da-f168-4329-bbb0-e6164cbbf2f5n%40googlegroups.com.
On Windows
<long long>dereference(x_smart_ptr).get()
is reporting 0, so somehow the pointer initialization is failing.
Factoring it out to:
```
cdef unique_ptr[double]* make_smart_ptr(double x) nogil:
cdef unique_ptr[double] tmp = make_unique[double](x)
return new unique_ptr[double](move(tmp))
...
x_smart_ptr = make_smart_ptr(x)
```
seems to work.
I have no immediate idea why though. On Linux the original code seems to work.
To view this discussion visit https://groups.google.com/d/msgid/cython-users/19b4b794-b958-42fb-b6f3-4bc2886ac2c2n%40googlegroups.com.
I'm able to reproduce the problem in a pretty short bit of c++ code so I've reported it to Microsoft at https://developercommunity.visualstudio.com/t/Move-assignment-of-openmp-private-variab/11051965. From experience, they probably will fix it but not quickly.
A few (untested) suggestions for how to work around it with Cython:
* define the C macro CYTHON_USE_CPP_STD_MOVE to 0. At least on my
simple example, copy rather than move doesn't generate the problem
and initializing `y0` involves moving from a temporary variable.
Although this will make some things perform worse.
* Use /std:c++17 - I don't see the problem with this set.
* Use the clang-cl compiler instead of MSVC. This won't work with
setuptools unfortunately, which I think means it won't work with
IPython. But I believe will work with Meson as a build system. I'm
not really able to provide useful support for this though, but the
idea of clang-cl is that it should be an almost drop-in
replacement for MSVC.
Hopefully some of that's helpful, but I'm afraid I don't have any definitive solution for how to get this to work reliably. So it may depend on how much time you want to spend on this...
To view this discussion visit https://groups.google.com/d/msgid/cython-users/42a69fa7-16ff-49bc-a1cf-ca5b1a999ebbn%40googlegroups.com.