Hello!
As requested by Scott, I am posting my issue here. I tried to use the library for my needs but encountered a problem. I narrowed down the problem to a reproducible test case (attached)
The problem is thus:
When running the following code:
const float pMaskCross[] = { 0.0, 1.0, 0.0,
1.0, 1.0, 1.0,
0.0, 1.0, 0.0 };
array maskCross = array(3, 3, pMaskCross);
array img = array(matrixDimY, matrixDimX, matrixDimZ, pImg);
array eroded = af::erode(img, maskCross);
eroded.host(pOutput);
The output is different when compiling in CPU or CUDA builds.
The output in CPU build (which is also the desired output, and also the same as when calling imerode in matlab) - is that the erosion is done with the mask, slice by slice, along the last axis (see sample input and output later in post).
The output in CUDA build is the same as input - i.e. the erosion function does nothing.
I wrapped the above code in a mexfile to run from matlab. It also returns af::info() as the second parameter (so I could send it to you).
(One more note: The use of floating point numbers is intentional. )
Attached:
Code.rar - Sln+vcxproj+cpp of mex file, adapted from example code, but probably requires change in include/lib directories to where matlab is installed on your PC.
BinsAndPDB.rar - compilation with PDB of mex files in CPU (debug and release) and CUDA (debug only) so you can just run them from matlab and see results.
Example input/outputs:
Running the following two lines in matlab with the mex file in CPU Build and CUDA Build:
A = single(cat(3, [1,1,1; 1,1,1; 1,1,1], [0,1,0;1,0,1;0,1,0], [0,0,1; 0,1,1; 0,0,1]))
[B,s] = ILungMorph_debug(A)
Gives the following output on my PC (notice that it includes af::info on both cases, that's the s - as you requested) (notice differences in the matrix B)
CPU BUILD:
A(:,:,1) =
1 1 1
1 1 1
1 1 1
A(:,:,2) =
0 1 0
1 0 1
0 1 0
A(:,:,3) =
0 0 1
0 1 1
0 0 1
B(:,:,1) =
1 1 1
1 1 1
1 1 1
B(:,:,2) =
0 0 0
0 0 0
0 0 0
B(:,:,3) =
0 0 0
0 0 1
0 0 0
s =
ArrayFire v3.3.1 (CPU, 64-bit Windows, build f53efc3)
[0] Intel: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 16243 MB, Max threads(8)
CUDA BUILD:
A(:,:,1) =
1 1 1
1 1 1
1 1 1
A(:,:,2) =
0 1 0
1 0 1
0 1 0
A(:,:,3) =
0 0 1
0 1 1
0 0 1
B(:,:,1) =
1 1 1
1 1 1
1 1 1
B(:,:,2) =
0 1 0
1 0 1
0 1 0
B(:,:,3) =
0 0 1
0 1 1
0 0 1
s =
ArrayFire v3.3.1 (CUDA, 64-bit Windows, build f53efc3)
Platform: CUDA Toolkit 7.5, Driver: CUDA Driver Version: 7050
[0] GeForce GTX 960M, 4096 MB, CUDA Compute 5.0
With many thanks, Dor