I would like to use the NVIDIA CUDA (CUBLAS and CUFFT) libraries from within MATLAB using the loadlibrary command. I am running MATLAB R2010a on 64-bit Linux Ubuntu 9.10, gcc 4.4, with CUDA Toolkit 3.0. Here is what I get when I try to run the loadlibrary command:
>> loadlibrary('libcublas', '/usr/local/cuda/include/cublas.h');
??? Error using ==> loadlibrary at 368
Failed to preprocess the input file.
Output from preprocessor is:In file included from
/usr/local/cuda/include/vector_types.h:45,
from /usr/local/cuda/include/cuComplex.h:44,
from /usr/local/cuda/include/cublas.h:94:
/usr/local/cuda/include/host_defines.h:41:2: error: #error --- !!! UNSUPPORTED COMPILER
!!! ---
Apparently, the following check is failing:
#if !defined(__GNUC__) && !defined(_WIN32)
But, I would think that __GNUC__ should be defined. Has anyone had success using loadlibrary with CUBLAS or CUFFT? I can write a MEX-file and link it with these libraries. However, I would like to use loadlibrary as it would allow me to experiment with the libraries from MATLAB's interpreted environment. Any information would be appreciated.
Thanks in advance,
--Chris
Post your results if this works for you. At one point this define was needed by the header parsing code but it may no longer be
necessary.
I suggest making a backup copy of the file or using save as to save to myloadlibary.m in the same directory. If you need to move
the file you will need to create a private subdirectory under the new location and copy the file prototypes.pl from
toolbox/matlab/general/private to there.
Phil
"Christopher " <cam...@remove.this.alum.mit.edu> wrote in message news:hptl9f$8cu$1...@fred.mathworks.com...
Thanks, that fixed the loadlibrary problem. I'm getting a couple of warnings from loadlibrary from functions that try to return a complex type, but I'm not sure that I'll even need those functions. I haven't had a chance to try calling the library functions yet, but I'll add to this post if I have any issues.
Thanks again,
--Chris
"Philip Borghesani" <philip_b...@mathworks.spam> wrote in message <hq2sfm$945$1...@fred.mathworks.com>...
inHost = (1 : 4); % Test vector
outHost = zeros(size(inHost)); % Allocate space for final result
devPtr = libpointer('voidPtrPtr');
assert(~calllib('libcublas', 'cublasAlloc',length(inHost), 8, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 8, ...
inHost, 1, devPtr, 1));
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 8, ...
devPtr, 1, outHost, 1));
assert(~calllib('libcublas', 'cublasFree', devPtr));
disp(outHost);
Here are the relevant function signatures:
[uint32, voidPtrPtr] cublasAlloc(int32, int32, voidPtrPtr)
[uint32, voidPtr, voidPtr] cublasSetVector(int32, int32, voidPtr, int32, voidPtr, int32)
[uint32, voidPtr, voidPtr] cublasGetVector(int32, int32, voidPtr, int32, voidPtr, int32)
[uint32, voidPtr] cublasFree(voidPtr)
Hopefully, even if one doesn't have access to the CUBLAS library it's easy enough to understand how it's supposed to work. cublasAlloc allocates a memory buffer in the device's memory space, pointed to by devPtr. cublasGetVector and cublasSetVector copy data back and forth (also specifying the number of elements, element size, and strides for source and destination buffers). All functions return uint32 status that should be 0, and the voidPtr return arguments were added by the MATLAB wrapper.
The code runs without crashing and all return statuses are 0, but unfortunately the result outHost is still all zeros. I've consulted the documentation (http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_external/f42650.html) but I'm having difficulty mapping those examples to my case. Part of my confusion seems to be that I have a pointer to a memory location in the device's memory space.
Any "pointers" would be appreciated,
--Chris
inHost = single(1 : 10); % Test data
devPtr = libpointer('voidPtr');
% Allocate space for final result
outHostPtr = libpointer('voidPtr', zeros(size(inHost), class(inHost)));
assert(~calllib('libcublas', 'cublasAlloc', length(inHost), 4, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 4, ...
inHost, 1, devPtr, 1));
calllib('libcublas', 'cublasSscal', (length(inHost)), 2.0, devPtr, 1);
assert(~calllib('libcublas', 'cublasGetError'))
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 4, ...
devPtr, 1, outHostPtr, 1));
assert(~calllib('libcublas', 'cublasFree', devPtr));
disp(outHostPtr.Value);
Note that I changed it to single-precision because that's what my CUDA card supports, and I added a call to cublasSscal to multiply the vector elements by two. I also changed devPtr from libpointer('voidPtr') to libpointer('voidPtrPtr'), because that seems to be the approach for allocation functions. Finally, I made an explicity libpointer "outHostPtr" to hold the output results, and realized that the result had to be accessed using the Value property.
I'm still not 100% confident in my usage, so if anyone sees potential problems or can think of simplifications or improvements, I'd be interested in hearing about them.
--Chris
I'm adding to my own posts because I've run into a new issue. I'm having trouble understanding pointer addition with libpointers, when the pointers point to memory that is in the device space. Here is a modified example:
----------
inHost = single(1 : 10); % Test data
devPtr = libpointer('voidPtr');
% Allocate space for final result
outHostPtr = libpointer('voidPtr', zeros(size(inHost), class(inHost)));
assert(~calllib('libcublas', 'cublasAlloc', length(inHost), 4, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 4, ...
inHost, 1, devPtr, 1));
% calllib('libcublas', 'cublasSscal', (length(inHost)), 2.0, devPtr, 1);
% Need to call setdatatype() here? What input parameters?
calllib('libcublas', 'cublasSscal', length(inHost) - 1, 2.0, devPtr + 1, 1);
assert(~calllib('libcublas', 'cublasGetError'))
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 4, ...
devPtr, 1, outHostPtr, 1));
assert(~calllib('libcublas', 'cublasFree', devPtr));
disp(outHostPtr.Value);
---------
My intent was to double all of the elements in my vector except the first one. However, I got the following output result:
"2 4 6 8 10 12 14 16 18 10"
that is, every element except the last one got doubled. So even though I passed "devPtr+1" to the function, it looks like it still started with devPtr. Any ideas on how to fix this example?
--Chris
It's me again, adding to my own post in case what I have learned is useful to others out there...
I found a way to fix my problem was to create a prototype file and edit it, changing all the parameter types which are pointers to device memory to 'uint64' (with the exception of cublasAlloc, where I used a 'uint64ptr'). I need to be careful to keep track of the element size when doing pointer addition, and I also need to cast my uint64 to double in order to do addition, but at least it works. This seems a bit hackish, so if anyone has a better solution I'd like to hear it.
--Chris