Connect ArrayFire with MATLAB

1,214 views
Skip to first unread message

Csáti Zoltán

unread,
Aug 3, 2015, 2:22:24 PM8/3/15
to ArrayFire Users
Hi,

Is there a general way to connect ArrayFire with MATLAB? I took a look at https://groups.google.com/forum/#!topic/arrayfire-users/pfiw67fbEOc, but how can I use any of ArrayFire's backends with MATLAB? The cited link uses CUDA-MATLAB commands to copy the content of the array object to an mxGPUArray object. Could you please explain me in more detail how can we send and receive arrays from ArrayFire (using array object) to MATLAB (using mxGPUArray object)?
Do you recommend creating the mex file within Visual Studio eliminating the need of compiling within MATLAB?

A MATLAB API would be very useful, since the method of saving and loading from disk is very slow for large data sets.

Thanks

Eric

unread,
Aug 4, 2015, 2:09:37 PM8/4/15
to ArrayFire Users
I've had great luck connecting ArrayFire with Matlab.  I haven't looked through the link provided, but here's a brief description of what I do.  Keep in mind I am NOT a real C++ programmer, which is part of why I like ArrayFire.  So the process below may or may not be optimal.

I start with the helloworld Visual Studio 2013 solution and rename things.  I change the output type to Static Library (.lib).  I write code that is not necessarily designed for use with Matlab so that it can potentially be incorporated into other C++ efforts.  I just compile the C++ files to .obj files and leave the linking to later.  (You can compile by right-clicking a .cpp file and selecting Compile or hitting Ctrl+F7).

One other thing I have to do but do not understand: In the C/C++|Code Generation property page, I change the Runtime Library to "Multi-threaded DLL (/MD)".  Otherwise it seems to conflict with the runtime library Matlab uses.

I then write the MEX wrapper.  All the MEX file does is process the inputs, check their usage and types, get array sizes, etc.  This is all done on the host.  I do not use the mxGPUArray object.  I then also use the MEX file to create array objects.  I use code that looks like:

float *img_in = (float*)mxGetData(prhs[0]);

to get a pointer on the host and then something like

int numRowsImg = (int)mxGetM(prhs[0]);
int numColsImg = (int)mxGetN(prhs[0]);

array img(numRowsImg, numColsImg, img_in, afHost);//Copy input array to device

to copy this to the device.  Appropriate data are then passed to the functions defined in the VS 2013.  Output data are then copied to the host in a manner similar to the following:

array output = myfunc(img);//Assuming myfunc is defined in a .obj file
plhs[0] = mxCreateNumericMatrix(numRowsImg, numColsImg, mxSINGLE_CLASS, mxREAL);
float *out = (float*)mxGetData(plhs[0]);
output.host((void*)out);//Copy data from the device to the host


I compile this using the mex command in Matlab.  Copy the .obj and .h files from the Visual Studio 2013 project to the same directory as your MEX wrapper C++ file (the header should be included in your MEX wrapper .cpp file).  You can then use something like:

mex -g myfunc_mex.cpp ...
         myfunc.obj ...        
        -I"C:\Program Files\ArrayFire\v3\include" ...
        -L"C:\Program Files\ArrayFire\v3\lib" ...
        -lafcuda ...
        -largeArrayDims

You can use the -g flag as shown to debug in Visual Studio if you like.  You just need to make sure and use the CUDA_Debug configuration when compiling in VS.  Then use Visual Studio to attach to Matlab (Debug|Attach to Process) prior to calling the MEX file (but after compiling it with the mex -g command).  You can put breakpoints in the C++ code in Visual Studio and step through code normally (ignore the warning that VS 2013 provides about not being able to find symbols for the breakpoints).  Or you can omit the -g flag and use the CUDA_Release configuration in VS 2013.

Note that you have to do a bit more work if you work with complex arrays.  ArrayFire stores data in the form real0, imag0, real1, imag1, etc. at the same memory address.  Matlab uses two entirely separate arrays (not necessarily contiguous in memory) for the real and imaginary parts of a complex array.

One functionality I wish ArrayFire had is the ability to write arrays to a Matlab MAT file.  It's not hard to write your own function to do this, though.  In this case you need to create a MATFile object, and mxArray object, and a pointer on the host to the data.  You can then use matPutVariable to write the data to a MAT file.  I find this useful during debugging as you can then easily compare ArrayFire results to Matlab code results, inspect array sizes and values, plot arrays, etc.

Lastly, I would encourage you to keep trying.  Without too much effort I've translated some Matlab functions to ArrayFire and they're generally 8-50X faster with results that agree to within numerical precision of floats.

Good luck,
Eric

Csáti Zoltán

unread,
Aug 4, 2015, 3:05:19 PM8/4/15
to ArrayFire Users
Dear Eric,

Thank you for your detailed answer! I think I understood your workflow. In line array output = myfunc(img);, myfunc is the name of the .obj file what is compiled by modifying the helloworld file and is contains the algorithm itself, right? After giving the mex command as you wrote, what should I do (you wrote "Then use Visual Studio to attach to Matlab (Debug|Attach to Process) prior to calling the MEX file (but after compiling it with the mex -g command).  You can put breakpoints in the C++ code in Visual Studio and step through code normally (ignore the warning that VS 2013 provides about not being able to find symbols for the breakpoints).", but I don't understand it)? Does this method work with the CPU and the OpenCL backend, or with just CUDA? Which MATLAB version do you use? Does that cooperate with VS2013 or other settings are needed so that MATLAB recognises the VS2013 compiler? Could you please attach example functions so that I can try them out? A quick video would also help a lot, if you have time.

Thanks,
Zoli

Eric

unread,
Aug 4, 2015, 4:44:08 PM8/4/15
to ArrayFire Users
Zoli,

myfunc is the name of the function in the .obj file.  There can be multiple functions within a single .obj file (so long as they're defined in the same .cpp file and the same .h file).  But you're right, myfunc is in the Visual Studio 2013 project.  I usually delete the helloworld.cpp and just start my own from scratch.  But the myfunc.cpp file would be something like:

#include <arrayfire.h>
#include <myfunc.h>
using namespace af;

array myfunc(array input) {
  array output = 2 * input;
  return(array);
}

for a simple function that just returns twice the input.  Within the header file myfunc.h I would then have something like

#ifndef __MYFUNC_H__
#define __MYFUNC_H__

#include "arrayfire.h"
extern af::array myfunc(af::array input);

#endif //__MYFUNC_H__

If you compile this code, then in the .\CUDA\temp\Debug (or \Release) directory you will find myfunc.obj.  You need to copy this file and myfunc.h to the same directory as your MEX wrapper.

The MEX file myfunc_mex.cpp (you can name it whatever you like, this is my convention), would look something like:

#include "myfunc.h"
#include "mex.h"
#include "arrayfire.h"

using namespace af;

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, mxArray *prhs[])
{
  //Check usage
  if (nrhs!=1)
        mexErrMsgTxt("Invalid number of input arguments.  There must be exactly one.");

  if (nlhs!=1)
        mexErrMsgTxt("Invalid number of output arguments.  There must be exactly one.");

  if (!mxIsSingle(prhs[0]))
        mexErrMsgTxt("Input image must be of class single but is not.");

  //Get input
  float *img_in = (float*)mxGetData(prhs[0]);
  int numRowsImg = (int)mxGetM(prhs[0]);
  int numColsImg = (int)mxGetN(prhs[1]);

  //Create output
  plhs[0] = mxCreateNumericMatrix(numRowsImg, numColsImg, mxSINGLE_CLASS, mxREAL);    
  float *out = (float*)mxGetData(plhs[0]);

  //Create array object
  array img(numRowsImg, numColsImg, img_in, afHost);

  //Use myfunc
  array output = myfunc(img);

  //Copy result to host
  output.host((void*)out);
}


In regards to attaching to Visual Studio 2013:  If you compile myfunc.obj with the CUDA_Debug configuration and myfunc_mex with the mex -g option, you can then debug in Visual Studio.  You go through the process of creating the MEX file (myfunc_mex.mexw64 if you're in 64-bit Windows, for example).  Then, before you actually try to use the mex function from within Matlab, create a breakpoint in Visual Studio 2013 within myfunc.cpp.  Then go to Debug|Attach to Process and select the Matlab process (Matlab must be running).  It will generate debug symbols.  Then you can run something like

A = single(rand(128));
B = myfunc_mex(A);

from within Matlab.  When you get to the breakpoint in myfunc(), Visual Studio will break in myfunc.cpp.  You can then step through, inspect variables on the host, etc. as you normally might.

For the other questions:
* I have struggled with the CPU backend.  I think I did get it to work but it's not multi-threaded and was slower than Matlab.  I have not tried the OpenCL backend at all.

* I am using Matlab R2015a and it works fine with VS 2013.  I think I went through the mex -setup process and it found the compiler just fine.  I didn't do anything fancy in Matlab to set up the compiler and in particular I did not modify the .bat file it creates.

* Hopefully the code I posted above is sufficient.  It may have bugs as I'm coding into the dialog box for Google Groups, but it should be close.  My employer would frown upon me posting my actual code or a video, I'm afraid.

One more thing comes to mind:
Since I don't go through the build process, the nvvm64_30_0.dll file does not get copied.  This should be in %CUDA_PATH%\nvvm\bin.  You can either add this directory to the path or manually copy this file to the same directory as the compiled MEX file.  Likewise with afcuda.dll from %AF_PATH%\lib.  If the MEX file can't find one of these it will generate an error message along the lines of "specified module could not be found" or something like that.  I've added these directories to the my path environment variable.  If you distribute your MEX file to other users, you will need to pass these DLLs along with it.

Best regards,
Eric

Csáti Zoltán

unread,
Aug 7, 2015, 7:35:59 AM8/7/15
to ArrayFire Users
Dear Eric,

I did everything as you told. I managed to create the object file, and created the mex .cpp file. With the commands you wrote, I created the .mexw64 file by invoking the mex command within MATLAB. However, when I called it, it crashed MATLAB. What can be the problem? I have 64 bit MATLAB R2015a and VS 2013. I attached the VS solution as a zip containing the code responsible for the object file generation (I did it for CPU Release, but the CUDA is also wrong for MATLAB) and the helloworld_mex.cpp, which is the mex gateway for MATLAB. Could you please check if it runs for you? I was struggling with it for hours, but I can't imagine what the error could be.

Thank you kindly,
Zoli
ArrayFireMATLAB.zip
helloworld_mex.cpp

Eric

unread,
Aug 7, 2015, 11:25:25 AM8/7/15
to ArrayFire Users
Zoli,

You made the same mistake I frequently do.  In helloworld_mex.cpp you have


int numColsImg = (int)mxGetN(prhs[1]);

However, there is only one input argument to helloworld_mex and hence prhs[1] does not exist.  This should be


int numColsImg = (int)mxGetN(prhs[0]);

I kept some notes as I worked on your code that show what I did.  I compiled your code with the CUDA_Debug configuration in Visual Studio.  These steps go through my process of figuring out where Matlab is crashing.  If I had found that it was the myfunc() call that was causing the trouble then I would have used the Visual Studio 2013 debugger to step through it.  However, in this case myfunc() was correct and it was the MEX code that was causing the problem.

Best regards,
Eric

My Notes:
1.  In helloworld.cpp, using namespace af; appears twice.  Delete one instance.
2.  When I compile helloworld.cpp with CUDA_Debug configuration, I get the following error:

>> mex -g helloworld_mex.cpp ...
       helloworld.obj ...

       -I"C:\Program Files\ArrayFire\v3\include" ...
       -L"C:\Program Files\ArrayFire\v3\lib" ...
       -lafcuda ...
       -largeArrayDims
Building with 'Microsoft Visual C++ 2013 Professional'.
Error using mex helloworld.obj : error LNK2038: mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '2' doesn't match value '0' in helloworld_mex.obj
helloworld.obj : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug' doesn't match value 'MD_DynamicRelease' in
helloworld_mex.obj
   Creating library helloworld_mex.lib and object helloworld_mex.exp
LINK : warning LNK4098: defaultlib 'MSVCRTD' conflicts with use of other libs; use /NODEFAULTLIB:library
helloworld_mex.mexw64 : fatal error LNK1319: 2 mismatches detected


Change runtime library for the CUDA_Debug configuration to Multi-threaded DLL (/MD) in the C/C++|Code Generation property page then recompile.

3.  Run and see that indeed Matlab crashes.
4.  Put a bunch of mexPrintf() statements in helloworld_mex.cpp to see WHERE it crashes.  Is it even getting to the myfunc() call?
5.  I find that Matlab is crashing somewhere in here:


    //Get input
    float *img_in = (float*)mxGetData(prhs[0]);
    int numRowsImg = (int)mxGetM(prhs[0]);
    int numColsImg = (int)mxGetN(prhs[1]);

   
6.  Aha!  I make this mistake all the time.  helloworld_mex accepts only one input argument and so prhs[1] does not exist.
This should read


    int numColsImg = (int)mxGetN(prhs[0]);

7. Recompile the MEX file and test with:
>> A = single(randn(4));
>> B = helloworld_mex(A)
Got here 0.
Got here 1.
Got here 2.
Got here 3.
Got here 4.
Got here 5.

B =

   -0.2483    1.3430    0.9778    0.5877
    2.9794   -2.4150    2.0694   -1.5746
    2.8181    1.4345    1.4538    1.7768
    2.8344    3.2605   -0.6069   -2.2941

>> B./A


ans =

     2     2     2     2
     2     2     2     2
     2     2     2     2
     2     2     2     2

So we can indeed see that B is twice A as expected.  The "Got here n" statements are generated with mexPrintf() calls in helloworld_mex.cpp and should be removed now that the code is working properly.

Csáti Zoltán

unread,
Aug 8, 2015, 8:43:44 AM8/8/15
to ArrayFire Users
Dear Eric,

I just copied the line int numColsImg = (int)mxGetN(prhs[1]); you wrote, into my program. It was a mistake; you warned me that there can be mistypes in your code. Thank you, particularly for your notes!
I tried out everything (the debugging, the CPU, OpenCL and CUDA backends and modified my the myfunc.cpp file to incorporate QR decompositions, inversion, matrix multiplication, etc.). I am planning to collect the information from our discussion and create a Github submission, which might be useful for the others (if you don't bother). I am going to cite you.
My remarks and questions:
  1. What if a dll project had been chosen instead of a lib (as far as the generated mex is concerned)?
  2. I tried the OpenCL backend and is reported to work properly.
  3. If I use the mex command for the CUDA version with the -lafopencl flag or the OpenCL version with the -lafcuda flag, the mexw64 file created properly and there is no runtime error either. Shouldn't the improper flag pose an error?
  4. For the CUDA backend, I experienced quite a big slow-down for the QR decomposition. It is not an ArrayFire-MATLAB mex overhead because according to timings, but the execution of myfunc is slow. I haven't seen that the CPU or OpenCL version was that slow.
  5. The double precision performance is lower than the single precision one (as expected; however see my last comment on https://groups.google.com/forum/#!topic/arrayfire-users/hWHUxm0B5b8 for the contrary).
  6. About precision: QR decomposition with ArrayFire and with MATLAB is performed on a 1000x1000 random matrix. Their difference in maximum norm for the right triangular matrix: with single precision, it is 0.0014 (quite far from pow(2,-23)), with double precision, it is 2.7516e-12 (good, compared to pow(2,-52).
  7. You wrote that the CPU version is slower for numerical computations than MATLAB. It is because the CPU backend is not yet multithreaded (see the issues on Github). However, the OpenCL backend can give you multithreaded performance when the host is chosen.
  8. I tried the contribution from http://www.mathworks.com/matlabcentral/fileexchange/44408-matlab-mex-support-for-visual-studio-2013--and-mbuild so that MATLAB R2011a recognises VS 2013. It recognised, but during the application of the mex command, it couldn't fing cl.exe. It is weird.
  9. The nvvm64_30_0.dll is not required, perhaps because the Nvidia folder is on the Path. So it will only need for the distribution of the mex file.
  10. How can I free up the GPU memory? The variables remain allocated after the mex file had run.
My configuration: CPU: Core i5 2500K @4500MHz, GPU: GeForce GTX 560, Memory: 8GB DD3

Csáti Zoltán

unread,
Aug 8, 2015, 8:52:05 AM8/8/15
to ArrayFire Users
One more fact: even if the CPU backend is used, either -lafopencl or -lafcude is necessary for the linking. Don't know why.

Eric

unread,
Aug 8, 2015, 10:36:09 AM8/8/15
to ArrayFire Users
Zoli,

I'm glad you got it working.  I will point out in my previous post I did have the right usage of mxGetN(prhs[0]).  I actually checked this before writing my post as I figured I screwed it up, but somewhat amazingly I got it right.  That's why I thought it was interesting you made that mistake because I know I've made it several times, too.  Also, the GitHub submission is a great idea.  I think others will find it useful.

In regards to your questions, I'm not sure I can really answer any of them.  I'm looking forward to the day when ArrayFire's CPU backend is multithreaded, which I believe I read they are working on.  I'm actually not sure if the algorithms I've been working on are faster than Matlab because of the GPU or because the code is compiled.  Alternatively I'd also be interested to see if the Mathworks ever makes code generated from the Matlab Coder multi-threaded.  My Coder-generated MEX files were also slower than native Matlab.

I don't think my process would change if I changed the project type to DLL since I never even create the .lib file.  Likewise I would never create the DLL.  I suppose you could create the DLL and then reference that somehow when you compile the MEX file.  It seems like that gives you one more dependency to pass along if you distribute the code.  It also seems like linking in the code rather than referencing a DLL would seem more efficient, but this is a total guess and I could be wrong.

I find your item 10 interesting.  I don't believe there's anyway to free up the GPU memory with ArrayFire due to the nature of its memory manager.  I haven't run into any memory leaks or problems with this, though.  I've run Matlab loops with these ArrayFire/MEX files that have run continuously for hours at a time processing total amounts of data that far exceed the 4 GB on my video board.  It might be worth a separate post to this group

Best regards,
Eric

Csáti Zoltán

unread,
Aug 12, 2015, 11:44:31 AM8/12/15
to ArrayFire Users
Eric,

I created a Github submission on this topic: https://github.com/CsatiZoltan/ArrayFire-MATLAB. The guide is written but the solution files are not yet uploaded (perhaps tomorrow). If you have further additions what else to write down, please let me know.

Zoli

Eric

unread,
Aug 12, 2015, 2:42:35 PM8/12/15
to ArrayFire Users
Zoli,

This looks great.  I'm sure a lot of people will find it useful.

-Eric

Eric

unread,
Aug 17, 2015, 11:36:44 AM8/17/15
to ArrayFire Users
Zoli,

I downloaded your GitHub submission and looked at the CUDA performance.  As you reported, it seems really slow.  I compiled your myfunc.cpp as-is in Visual Studio and then compiled the MEX file in Matlab.  My results (for a Quadro 4000 in TCC mode).  I used a 5500x5500 single array.

>> in = randn(5500,'single');
>> R = myfunc_mex(in);
Error checking done in 8.55294e-07 seconds.

Variable creation done in 1.21924 seconds.

Algorithm done in 185.467 seconds.

Copying to the host done in 0.082745 seconds.

>> R = myfunc_mex(in);
Error checking done in 8.55294e-07 seconds.

Variable creation done in 0.0448555 seconds.

Algorithm done in 178.69 seconds.

Copying to the host done in 0.0616393 seconds.

>>

Interestingly, when Matlab was running it was churning away at 4%, which is one of my 24 cores.  Why Matlab was doing anything escapes me.  I also profiled this with the Visual Profiler and see that the GPU works initially and finishes very quickly, but then waits nearly two minutes before doing a copy HtoD of 121 MB and then a copy of 121 MB DtoH at the very end.  What it's doing for the intervening two minutes is a mystery.  It seems like for some reason Matlab is doing something.

I'm going to try tweaking the helloworld.cpp to run this analysis to get Matlab out of the loop.  Maybe that will provide some insight.

-Eric


Eric

unread,
Aug 17, 2015, 11:37:46 AM8/17/15
to ArrayFire Users
I should add:  My experience is that to get the Visual Profiler to work with Matlab you need to use the matlab.exe in MATLAB_PATH\bin\win64.  If you run the matlab.exe in MATLAB_PATH\bin it doesn't work.

-Eric

Eric

unread,
Aug 17, 2015, 12:16:26 PM8/17/15
to ArrayFire Users
GPUZ reports the same thing as the Visual Profiler.  The GPU Load spikes up quickly and then drops to zero for the bulk of the computation.  I started a new topic on QR performance at https://groups.google.com/forum/#!topic/arrayfire-users/qzjAL5IyVXo.

-Eric

Pavan Yalamanchili

unread,
Aug 17, 2015, 2:35:49 PM8/17/15
to Eric, ArrayFire Users
Hi Eric,

I think you replied to the wrong chain with the last email.

--
Pavan

On Mon, Aug 17, 2015 at 12:16 PM Eric <eric.a....@gmail.com> wrote:
GPUZ reports the same thing as the Visual Profiler.  The GPU Load spikes up quickly and then drops to zero for the bulk of the computation.  I started a new topic on QR performance at https://groups.google.com/forum/#!topic/arrayfire-users/qzjAL5IyVXo.

-Eric

--
You received this message because you are subscribed to the Google Groups "ArrayFire Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arrayfire-use...@googlegroups.com.
To post to this group, send email to arrayfi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arrayfire-users/eb340b52-2cb9-4f1e-bfea-475e38b85996%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Csáti Zoltán

unread,
Aug 18, 2015, 6:04:44 AM8/18/15
to ArrayFire Users
Dear Eric,

I saw from the linked post that it is a CUDA bug. Hope that they will fix it soon. Fortunately, there is little overhead using mex compared to native ArrayFire. Why is the CPU backend much faster than the GPU, even for single data type? Do you have a very fast CPU?

Zoli

Eric

unread,
Aug 18, 2015, 10:33:58 AM8/18/15
to ArrayFire Users
My development machine has relatively new CPUs and relatively old GPUs.  The Quadro 4000 is about 5 years old and is pretty dated.  But my system has two Xeon E2620 v3 @ 2.4 GHz processors.  These each have 6 cores and 12 threads and were released just last fall.  I'm guessing the 15 MB of Smart Cache on each processor allows them to perform very efficiently in some cases.  Hyper-threading is probably helping considerably as well.

-Eric

Pavan Yalamanchili

unread,
Aug 18, 2015, 10:38:27 AM8/18/15
to Csáti Zoltán, ArrayFire Users
Hi Csati,

QR consists of two parts. First the QR decomposition inside a single matrix followed by the splitting it into two matrices.

When benchmarking the first part, the CUDA version was faster than the CPU version by a fair margin (i7 CPU vs 680 GPU).

Once nvidia fixes their performance issues we will see the overall improvements.

--
Pavan
--
You received this message because you are subscribed to the Google Groups "ArrayFire Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arrayfire-use...@googlegroups.com.
To post to this group, send email to arrayfi...@googlegroups.com.

Eric

unread,
Oct 9, 2015, 4:09:30 PM10/9/15
to ArrayFire Users, kova...@gmail.com
This discussion is a bit old, but I thought I'd add to it with what I've recently learned.  If you follow the instructions in Matlab's documentation (http://www.mathworks.com/help/matlab/matlab_external/compiling-mex-files-with-the-microsoft-visual-c-ide.html) you can work entirely in VS 2013.  Below are some notes I put together as I set this up.  The end result is a VS 2013 project file in which you can compile directly to a .mexw64 file without having to jump through some of the hoops I described earlier.

When setting Configuration Properties, make sure the Configuration being set is “All Configurations”.  Also feel free to delete configurations you will not use.


  1.  Copy helloworld_vs2013.sln, helloworld_vs2013.vcxproj, and helloworld.cpp to a new directory.
  2.  Open it helloworld_vs2013.sln.
  3.  Rename the helloworld project entry in Solution Explorer to deconvLR_AF_VS
  4.  Right-click this project and go to Configuration Properties|General|Configuration Type and change it from Application (.exe) to Dynamic library (.dll).  Click Apply.
  5.   Remove the helloworld entry in the Solution Explorer since we’re not using helloworld.cpp.
  6.  Add deconvLR_mex.cpp and deconvLR_mex.h to the project
  7.  Under Configuration Properties|C/C++|General add $(MATLAB_PATH)\extern\include
  8.  Right-click the deconvLR_AF_VS project and click Add New Item.  Select Visual C++|Code|Module-Definition File (.def).  Name the file deconvLR_AF_VS.def.
  9. Make the contents of this DEF file:
          LIBRARY DECONVLR_AF_VS
          EXPORTS mexFunction

10.  
Under C/C++ Preprocessor properties, add MATLAB_MEX_FILE as a preprocessor definition.

11.   Under Linker|General, add $(MATLAB_PATH)\extern\lib\win64\microsoft to the Additional Library Directories

12.   Under Linker|Input, make sure the Modulate Definition File (deconvLR_AF_VS.def) is present.

13.   Under Linkers|Input add libmx.lib, libmex.lib, and libmat.lib to Additional Dependencies.

14.   Under Configuration Properties|General change Target Extension from .dll to .mexw64. Make sure this is done for “All Configurations”.

15.   When you build the solution, you will see a CUDA\deconvLR_AF_VS.mexw64 file now exists.  The build process also copied nvvm64_30_0.dll (the CUDA DLL file) into this directory.  This file needs to either be on the Windows path or in the same directory as the .mexw64 file.  Otherwise an error saying that the specified module cannot be found will be returned when trying to use the mex file in Matlab.

Debugging:

Note that you can use clear mex in Matlab to free up the MEX file.


1. Change the Configuration Type to CUDA_Debug

2.  Build the solution.  Note that _debug is appended to the root of the filename.  You need to make sure your Matlab code calls this .mexw64, not the other.

3.       Add breakpoints in VS2013 where you want.

4.       Go to Debug|Attach to Process to attach to Matlab.

5.       Call the function under test and you should stop at the breakpoint.

Csáti Zoltán

unread,
Oct 10, 2015, 3:04:21 PM10/10/15
to ArrayFire Users, kova...@gmail.com
Dear Eric,

Thanks for the new, improved method. I will try it. The main concern is that I will have a Matlab licence (R2013b) for Linux very soon. Do you have experience with Matlab-ArrayFire under Linux?

Thanks

Eric

unread,
Oct 12, 2015, 12:06:21 PM10/12/15
to ArrayFire Users, kova...@gmail.com
Unfortunately I have no experience with either Matlab or ArrayFire under Linux.

Best regards,
Eric

Eric

unread,
Nov 10, 2015, 5:41:56 PM11/10/15
to ArrayFire Users
For people who may be following this chain:

I'm finding great utility in creating simple MEX file wrappers around ArrayFire functions.  Even with the memory overhead associated with going to/from the GPU I'm finding significant speed improvements in operations like 2D convolutions, 2D Fourier transforms, and image translations.  Of course a better solution is to start from scratch in ArrayFire, but I've made meaningful improvements (e.g., 10X or more) in function calls by simple function replacements for ArrayFire codes.  Your mileage may vary of course and it's worth checking whether the operation in question results in a net gain considering the time necessary to transfer data to/from the GPU device.

-Eric
Reply all
Reply to author
Forward
0 new messages