Using loadlibrary with NVIDIA CUDA (CUBLAS and CUFFT) libraries

Christopher

unread,

Apr 11, 2010, 7:17:03 PM4/11/10

to

Hi,

I would like to use the NVIDIA CUDA (CUBLAS and CUFFT) libraries from within MATLAB using the loadlibrary command. I am running MATLAB R2010a on 64-bit Linux Ubuntu 9.10, gcc 4.4, with CUDA Toolkit 3.0. Here is what I get when I try to run the loadlibrary command:

>> loadlibrary('libcublas', '/usr/local/cuda/include/cublas.h');
??? Error using ==> loadlibrary at 368
Failed to preprocess the input file.
Output from preprocessor is:In file included from
/usr/local/cuda/include/vector_types.h:45,
from /usr/local/cuda/include/cuComplex.h:44,
from /usr/local/cuda/include/cublas.h:94:
/usr/local/cuda/include/host_defines.h:41:2: error: #error --- !!! UNSUPPORTED COMPILER
!!! ---

Apparently, the following check is failing:

#if !defined(__GNUC__) && !defined(_WIN32)

But, I would think that __GNUC__ should be defined. Has anyone had success using loadlibrary with CUBLAS or CUFFT? I can write a MEX-file and link it with these libraries. However, I would like to use loadlibrary as it would allow me to experiment with the libraries from MATLAB's interpreted environment. Any information would be appreciated.

Thanks in advance,
--Chris

Philip Borghesani

unread,

Apr 13, 2010, 6:50:30 PM4/13/10

to

Christopher
Edit loadlibrary.m and modify the code block (near line 280) that looks like this to remove -U __GNUC__:
case 'GLNXA64'
cc='gcc -U __GNUC__ -m64';
thunk_build='%s %s %s "%s" -o "%s" -Wl,-E -shared -fPIC';

Post your results if this works for you. At one point this define was needed by the header parsing code but it may no longer be
necessary.

I suggest making a backup copy of the file or using save as to save to myloadlibary.m in the same directory. If you need to move
the file you will need to create a private subdirectory under the new location and copy the file prototypes.pl from
toolbox/matlab/general/private to there.

Phil

"Christopher " <cam...@remove.this.alum.mit.edu> wrote in message news:hptl9f$8cu$1...@fred.mathworks.com...

Christopher

unread,

Apr 15, 2010, 9:36:04 AM4/15/10

to

Phil,

Thanks, that fixed the loadlibrary problem. I'm getting a couple of warnings from loadlibrary from functions that try to return a complex type, but I'm not sure that I'll even need those functions. I haven't had a chance to try calling the library functions yet, but I'll add to this post if I have any issues.

Thanks again,
--Chris

"Philip Borghesani" <philip_b...@mathworks.spam> wrote in message <hq2sfm$945$1...@fred.mathworks.com>...

Christopher

unread,

Apr 24, 2010, 7:37:05 PM4/24/10

to

OK, now I'm running into a different problem, mostly due to my misunderstanding of how libpointers work. I'm trying to create a memory buffer on the GPU device, then write some data into that buffer and read the results back. Here is the main section of my code:

inHost = (1 : 4); % Test vector
outHost = zeros(size(inHost)); % Allocate space for final result
devPtr = libpointer('voidPtrPtr');
assert(~calllib('libcublas', 'cublasAlloc',length(inHost), 8, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 8, ...
inHost, 1, devPtr, 1));
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 8, ...
devPtr, 1, outHost, 1));
assert(~calllib('libcublas', 'cublasFree', devPtr));
disp(outHost);

Here are the relevant function signatures:
[uint32, voidPtrPtr] cublasAlloc(int32, int32, voidPtrPtr)
[uint32, voidPtr, voidPtr] cublasSetVector(int32, int32, voidPtr, int32, voidPtr, int32)
[uint32, voidPtr, voidPtr] cublasGetVector(int32, int32, voidPtr, int32, voidPtr, int32)
[uint32, voidPtr] cublasFree(voidPtr)

Hopefully, even if one doesn't have access to the CUBLAS library it's easy enough to understand how it's supposed to work. cublasAlloc allocates a memory buffer in the device's memory space, pointed to by devPtr. cublasGetVector and cublasSetVector copy data back and forth (also specifying the number of elements, element size, and strides for source and destination buffers). All functions return uint32 status that should be 0, and the voidPtr return arguments were added by the MATLAB wrapper.

The code runs without crashing and all return statuses are 0, but unfortunately the result outHost is still all zeros. I've consulted the documentation (http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_external/f42650.html) but I'm having difficulty mapping those examples to my case. Part of my confusion seems to be that I have a pointer to a memory location in the device's memory space.

Any "pointers" would be appreciated,
--Chris

Christopher

unread,

Apr 25, 2010, 1:53:07 PM4/25/10

to

After studying the documentation on libpointers some more, I was able to fix my code so that it works:

inHost = single(1 : 10); % Test data
devPtr = libpointer('voidPtr');

% Allocate space for final result

outHostPtr = libpointer('voidPtr', zeros(size(inHost), class(inHost)));
assert(~calllib('libcublas', 'cublasAlloc', length(inHost), 4, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 4, ...
inHost, 1, devPtr, 1));
calllib('libcublas', 'cublasSscal', (length(inHost)), 2.0, devPtr, 1);
assert(~calllib('libcublas', 'cublasGetError'))
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 4, ...
devPtr, 1, outHostPtr, 1));

assert(~calllib('libcublas', 'cublasFree', devPtr));

disp(outHostPtr.Value);

Note that I changed it to single-precision because that's what my CUDA card supports, and I added a call to cublasSscal to multiply the vector elements by two. I also changed devPtr from libpointer('voidPtr') to libpointer('voidPtrPtr'), because that seems to be the approach for allocation functions. Finally, I made an explicity libpointer "outHostPtr" to hold the output results, and realized that the result had to be accessed using the Value property.

I'm still not 100% confident in my usage, so if anyone sees potential problems or can think of simplifications or improvements, I'd be interested in hearing about them.

--Chris

Christopher

unread,

Apr 28, 2010, 9:47:05 AM4/28/10

to

Hello everyone,

I'm adding to my own posts because I've run into a new issue. I'm having trouble understanding pointer addition with libpointers, when the pointers point to memory that is in the device space. Here is a modified example:
----------

inHost = single(1 : 10); % Test data
devPtr = libpointer('voidPtr');
% Allocate space for final result
outHostPtr = libpointer('voidPtr', zeros(size(inHost), class(inHost)));
assert(~calllib('libcublas', 'cublasAlloc', length(inHost), 4, devPtr));
assert(~calllib('libcublas', 'cublasSetVector', length(inHost), 4, ...
inHost, 1, devPtr, 1));

% calllib('libcublas', 'cublasSscal', (length(inHost)), 2.0, devPtr, 1);
% Need to call setdatatype() here? What input parameters?
calllib('libcublas', 'cublasSscal', length(inHost) - 1, 2.0, devPtr + 1, 1);

assert(~calllib('libcublas', 'cublasGetError'))
assert(~calllib('libcublas', 'cublasGetVector', length(inHost), 4, ...
devPtr, 1, outHostPtr, 1));
assert(~calllib('libcublas', 'cublasFree', devPtr));
disp(outHostPtr.Value);

---------
My intent was to double all of the elements in my vector except the first one. However, I got the following output result:
"2 4 6 8 10 12 14 16 18 10"
that is, every element except the last one got doubled. So even though I passed "devPtr+1" to the function, it looks like it still started with devPtr. Any ideas on how to fix this example?

--Chris

Christopher

unread,

May 2, 2010, 10:07:03 AM5/2/10

to

Hi,

It's me again, adding to my own post in case what I have learned is useful to others out there...

I found a way to fix my problem was to create a prototype file and edit it, changing all the parameter types which are pointers to device memory to 'uint64' (with the exception of cublasAlloc, where I used a 'uint64ptr'). I need to be careful to keep track of the element size when doing pointer addition, and I also need to cast my uint64 to double in order to do addition, but at least it works. This seems a bit hackish, so if anyone has a better solution I'd like to hear it.

--Chris

Randy Miles

unread,

Jan 30, 2013, 4:03:08 PM1/30/13

to

Christopher,

Thank you for the tips you posted in this thread. I was able to accomplish what you were trying to do by declaring the devPtr as a singlePtr rather than voidPtr:

inHost = single(1 : 10); % Test data

outHostPtr = libpointer('voidPtr', zeros(size(inHost), class(inHost)));

devPtr = libpointer('singlePtr');
calllib(libname, 'cublasAlloc', length(inHost), 4, devPtr)
calllib(libname, 'cublasSetVector', length(inHost), 4, ...
inHost, 1, devPtr, 1)
calllib(libname, 'cublasSscal', (length(inHost))-1, 2.0, devPtr+1, 1)
calllib(libname,'cublasGetError')
calllib(libname,'cublasGetVector',length(inHost),4,devPtr,1,outHostPtr,1)
calllib(libcublas,'cublasFree',devPtr)
outHostPtr.value

Producing:
ans =

1 4 6 8 10 12 14 16 18 20

I don't know if the outHostPtr should be a 'singlePtr' as well... but I thought I'd add my two cents here in case it helps anyone.

Randy

"Christopher " <cam...@remove.this.alum.mit.edu> wrote in message <hrk0u7$pli$1...@fred.mathworks.com>...

Francisco Ramírez

unread,

Feb 18, 2013, 11:39:07 AM2/18/13

to

Hi Christopher,

I saw that you can integrate cuBlas into Matlab via mex-function. I'm trying to do this but I can't. Could you help me with this? Can you gimme some examples and instructions about this?

Gratefully,

Francisco.

"Christopher " <cam...@remove.this.alum.mit.edu> wrote in message <hptl9f$8cu$1...@fred.mathworks.com>...