Compiling CUDA C/C++ mex code under linux

Oliver Woodford

unread,

Sep 29, 2009, 4:53:01 AM9/29/09

to

Hi all

There a several methods available on the file exchange for compiling CUDA C/C++ code into mex files under Windows, but none that I've come across work for linux. However, I've found a nice, easy way to do it, which I'll share with you, though I must confess I haven't tested it extensively.

The idea is to use Nvidia's nvcc compiler to convert CUDA C/C++ code into standard C++ code, then use mex after that. The first stage looks something like:

system(sprintf('nvcc -I"%s/extern/include" --cuda "mexfun.cu" --output-file "mexfun.cpp"', matlabroot));

Then the second stage is roughly:

mex -I/opt/cuda/include -L/opt/cuda/lib -lcudart mexfun.cpp

Obviously you need to set the various paths and file/function names to suit your needs.

HTH,
Oliver

PS Does anyone think this approach could reduce the efficiency of the resulting machine code? I do wonder if it doesn't limit the level of optimization that can be applied.

Thomas Clark

unread,

Dec 7, 2009, 7:11:04 AM12/7/09

to

Oliver!

Five stars, thanks very much. I was SO close to this solution, but with things still going wrong - your commands below just sorted me right out :)

Re. Performance, I don't think it'll have a detrimental effect. The NVCC compiler will use it's standard options for the device code; and gcc (or whatever c/c++ compiler you use) will use the options that mex passes it to compile the part of the code which runs on the host.

If you were concerned about the host code performance, you can always adjust the mexopts.sh file (in matlabroot/bin/) or the call to mex to pass various performance flags (O3 etc) in to the compiler.

Cheers again!

Tom Clark

"Oliver Woodford" <o.j.woo...@cantab.net> wrote in message <h9shtd$q9g$1...@fred.mathworks.com>...

Jorian

unread,

Mar 7, 2010, 5:46:05 AM3/7/10

to

"Thomas Clark" <t.c...@remove.spamcantab.net> wrote in message <hfirco$96e$1...@fred.mathworks.com>...

Thanks!

Best regards, Jorian Seokaner!

http://www.mathworks.com/matlabcentral/newsreader/author/126323

Andrei

unread,

Nov 18, 2011, 6:21:10 AM11/18/11

to

hi all,

I'm trying to use your method to call CUDA code as function in MATLAB. I succeeded to obtain the mex file, but I have some problems:

Mex file entry point is missing. Please check the (case-sensitive)
spelling of mexFunction (for C MEX-files), or the (case-insensitive)
spelling of MEXFUNCTION (for FORTRAN MEX-files).
Invalid MEX-file

can you please help me?

Thank you

Oliver Woodford

unread,

Nov 18, 2011, 8:13:08 AM11/18/11

to

The error is quite helpful. Do you have a mexFunction function in your mex file?

Andrei

unread,

Nov 21, 2011, 4:53:13 AM11/21/11

to

"Oliver Woodford" wrote in message <ja5ll3$s5c$1...@newscl01ah.mathworks.com>...

I have a CUDA file, using your commands I generated the mex file.

I have to write a mex file? I thought that the commands above generates the mex file.

Maybe you can post some example files.

Thanks

Oliver Woodford

unread,

Nov 22, 2011, 11:19:08 AM11/22/11

to

"Andrei" wrote:
>
> I have a CUDA file, using your commands I generated the mex file.
>
> I have to write a mex file? I thought that the commands above generates the mex file.

Andrei,

The commands I give compile a .cu file into a .cpp file. They do not create a mexFunction host function (which is the gateway function between MATLAB and any C/C++/CUDA code) in the resulting .cpp file. You either need to have a mexFunction function in the .cu file or in another .c/.cpp file which you compile along with the converted .cu file in the last step.

I have included a simple example of a .cu file containing such a function below, which generates an array of 256 random numbers and squares them on the gpu, then checks these against the result computed on the gpu.

This topic is a little outside the scope of this thread, so if you need further help then I suggest you look at MATLABs documentation on mex files, and/or post a new question to this newsgroup or on MATLAB Answers.

HTH,
Oliver

===================================================

#include "mex.h"

// CUDA kernel which squares a float
__global__ void sq(float *d_buffer)
{
d_buffer[threadIdx.x] *= d_buffer[threadIdx.x];
}

// Function which interfaces with MATLAB
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
if (nrhs != 0)
mexErrMsgTxt("No inputs expected.");
if (nlhs != 0)
mexErrMsgTxt("No outputs expected.");

// Generate the data
float h_buffer[256];
for (int a = 0; a < 256; ++a)
h_buffer[a] = float(rand()) / RAND_MAX;

// Copy to the gpu
float *d_buffer;
cudaMalloc((void **)&d_buffer, 256 * sizeof(*d_buffer));
cudaMemcpy(d_buffer, h_buffer, 256 * sizeof(*d_buffer), cudaMemcpyHostToDevice);

// Call the CUDA kernel
sq<<<1, 256>>>(d_buffer);

// Copy from gpu
float h_buffer2[256];
cudaMemcpy(h_buffer2, d_buffer, 256 * sizeof(*d_buffer), cudaMemcpyDeviceToHost);

// Check result
for (int a = 0; a < 256; ++a) {
if (h_buffer2[a] != h_buffer[a] * h_buffer[a])
mexErrMsgTxt("Error in calculation!");
}
mexPrintf("Test passed!\n");
}

Andrei

unread,

Nov 23, 2011, 4:11:09 AM11/23/11

to

"Oliver Woodford" wrote in message <jagi1s$1j8$1...@newscl01ah.mathworks.com>...

Thank you very much.

Angelo

unread,

May 2, 2012, 10:25:14 AM5/2/12

to

Dear Oliver,
I'm using your two-steps approach to compile "CUDA-accelerated" mexfiles under linux since much time with no problem.
I now need to do the same thing under Windows 7 but I have bbeen unsuccessful.

As you suggested, under linux I use

NVIDIA_SDK='/home/angelo/NVIDIA_GPU_Computing_SDK/C/common/inc'
MATLAB_INCLUDE='/Matlab2008A/extern/include'
MATLAB_MEX='/Matlab2008A/bin/mex'
CUDA_INCLUDE='/usr/local/cuda/include'
CUDA_LIB64='/usr/local/cuda/lib64'

nvcc -O2 -m64 -use_fast_math --cuda -arch=sm_20 Filename.cu -I$NVIDIA_SDK -I$MATLAB_INCLUDE --output-file Filename.cpp

$MATLAB_MEX Filename.cpp tictoc.c -I$CUDA_INCLUDE -L$CUDA_LIB64 -L/lib -lcudart -lcufft -lrt -lpthread -lstdc++ -o Filename

Under Windows I'm using, for the first step,

setenv('MATLAB_INCLUDE','"C:\Program Files\MATLAB\R2008b\extern\include"');
setenv('NVIDIA_SDK_INCLUDE','"C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.1\C\common\inc"');

system((strcat('nvcc -O2 -m64 -use_fast_math --cuda -arch=sm_20 Filename.cu -I',getenv('MATLAB_INCLUDE'),' -I',getenv('NVIDIA_SDK_INCLUDE'),' --output-file Filename.cpp')))

Unfortunately, I receive 83 errors in this case. It seems to me that nvcc does not properly include the files (either the CUDA and the MEX includes). Some of the errors are, for example,

error: attribute "__global__" does not apply here
error: identifier "cufftDoubleComplex" is undefined
error: identifier "mxArray" is undefined

I'm using Microsoft Visual Studio 10.0 and I also installed the Windows SDK 7.1.

From a different post, it seemed to me that you are using the same approach also for Windows. Could you be so kind to help me?

Thanks.
Angelo

"Oliver Woodford" wrote in message <h9shtd$q9g$1...@fred.mathworks.com>...