Hi Henning,
You probably want to use the allocateRemote
function, to get an array on the GPU.
Array
is defined in Sugar.hs
as:
data Array sh e where
Array :: (Shape sh, Elt e)
=> EltRepr sh -- extent of dimensions = shape
-> ArrayData (EltRepr e) -- array payload
-> Array sh e
Some relevant examples for how to use it might be in accelerate-fft here or accelerate-blas here. If you point me to your code I could give you a more specific example.
Hope that helps!
-Trevor
--
You received this message because you are subscribed to the Google Groups "Accelerate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to accelerate-hask...@googlegroups.com.
Visit this group at https://groups.google.com/group/accelerate-haskell.
For more options, visit https://groups.google.com/d/optout.
Hi Henning,
CUDA operations are executed within a given context. Code and data are specific to a context. Two contexts might exist on the same device, or separate devices, but in both cases they are entirely distinct; addresses are not transferable.
Exporting inDefaultContext
was not a good idea, I can’t remember when or why that was done. I think we have always had the run*With
functions, which allow you to supply to context in which to run, so assuming that you are running on the default context is not valid.
The context that you are actually running on is part of the state of the LLVM PTX
state monad which all operations are executed within. The code I linked above just queries what the current context is (L47), and then uses this as a key into a map structure which is used as a cache (L74).
For your particular use case, you’ll probably need to key not only on the execution context but also the size & type the cuFFT was created for.
I hope that helps explain it?
-Trev
Hi Henning,
If I want to run a cuFFT based program on simply the best available device, such as ‘run’ does, how to get the according context?
If you want to (outside of accelerate) determine what the best available device is, there are functions from the cuda ffi bindings you can use. “Best” is pretty easy to estimate based on the device properties, but “available” is tricker. Once you have figured out which device you want, you can create a context for it which you can pass to the run*With
functions (createTargetForDevice
or createTargetFromContext
).
If you just wanted to get the context that accelerate decided was best (at that particular time), you can do it in the way I showed previously.
Would it hurt to export defaultContext/defaultTarget
I think it would.
There is no reason to expect that run
will always use the same context. For example, once we encounter an error (even minor ones) generally the only way to recover from this is to destroy the context and start again. This just seems to be a limitation of the CUDA API; once a call fails, all subsequent calls in that context fail. So exporting a default context either ties us into an unfortunate position, or isn’t constant and thus not useful for you anyway.
Perhaps what you want for your library is to provide your own PTX
target and tell your users “you must use the run*With
functions with this particular context, because all the FFT state is tied to it”… I don’t know, maybe that design works better for you?
Btw. I am not convinced by the solution with a global context cache. This way, all BLAS functions have to synchronize access to that global cache, where in principle no synchronization is necessary. Sure, the accesses are short and infrequent, but in principle it feels not right.
I am open to suggestions.
Cheers,
-Trev
Hey Henning,
Then, how about exporting a function that searches for a good default
context for me?
Just selecting device 0 is usually fine. I believe the CUDA driver already orders the devices for you. However… [continues after break]
So far, I use explicit handles for cuFFT. This is natural for cuFFT
because the handles are bound to particular data sizes. I will try to
stick to this scheme. The problem I see is, how can I assert that the plan
creation and transformation are performed in the same context.
…If you are not doing automatic plan management, then I don’t think you should be doing automatic context management either.
I think you just want your plan creation function to be explicitly tied to a given target; i.e.:
createPlanForTarget :: PTX -> {- FFT parameters -} -> IO Plan
And then the user supplies that Plan
together with its associated PTX
target when they call run*With
. This seems to be closer to what you want, and as you mention, avoids any overheads associated with the automatic management methods I use.
All the best,
-Trev