cudamatrix compilation

Vassil Panayotov

unread,

Dec 30, 2016, 8:54:04 AM12/30/16

to kaldi...@googlegroups.com

Hi everyone,

this is likely a non-issue, but figured I could ask just in case..
I was compiling Kaldi today and noticed it prints out stuff like:

#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:
#$ PATH=/usr/local/cuda/bin/../open64/bin:/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:
/sbin:/bin:/usr/games:/usr/local/games
#$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"
#$ LIBRARIES= "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -D__CUDA_ARCH__=530 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -fPIC -I"/usr/local/cuda/include" -I"../" "-I/usr/lo
cal/cuda/bin/../targets/x86_64-linux/include" -D"__CUDACC_VER__=70517" -D"__CUDACC_VER_BUILD__=17" -D"__CUDACC_VER_MINOR__=5" -D"__CUDACC_VER_
MAJOR__=7" -D"HAVE_CUDA" -D"KALDI_DOUBLEPRECISION=0" -include "cuda_runtime.h" -m64 -g -gdwarf-2 "cu-kernels.cu" > "/tmp/tmpxft_0000276a_0000000
0-19_cu-kernels.compute_53.cpp1.ii"

I was wondering about the "-DCUDA_DOUBLE_MATH_FUNCTIONS" bit.. Does it mean the compiled kernels will use double precision(i.e. slower) arithmetic? It's not immediately obvious to me where this comes from, but I suspect it's somehow added by nvidia's compilers, and not by Kaldi's build system?

In short my question is whether this flag can cause performance penalty and if so, how to get rid of it?

Thanks,
Vassil

Daniel Galvez

unread,

Dec 31, 2016, 10:42:03 AM12/31/16

to kaldi-help

I decided to look into this.

First of all the symbol CUDA_DOUBLE_MATH_FUNCTIONS is not documented in the CUDA libraries as far as I can see. Ugh.

I noticed that it is defined only when gcc is called with -E (You can check with `make -j 8 &> make.log; grep "CUDA_DOUBLE_MATH_FUNCTIONS" make.log`. That is, it is defined only during preprocessing, so I suspected that it is used for an #ifdef guard at some point.

So I looked at the two directories on the include path: `/usr/local/cuda/bin/../targets/x86_64-linux/include` and `/usr/local/cuda/include`. The former doesn't seem to be an actual path, at least on the CLSP machines, and when I do `grep -Ir "CUDA_DOUBLE_MATH_FUNCTIONS" /usr/local/cuda/include`, nothing comes up. So my guess is that the define is indeed harmless.

I tested this using CUDA 7.5.18.

Vassil Panayotov

unread,

Jan 1, 2017, 5:29:14 AM1/1/17

to kaldi...@googlegroups.com

Thanks for taking time to look into this!

Yes, this doesn't seem to be documented, and even Google search returns only a handful of results for "CUDA_FLOAT_MATH_FUNCTIONS". One example (http://upstream.rosalinux.ru/diffs/cuda/3.0_to_3.1/diff.html):

static __forceinline__ float fdividef(float a, float b)
{
#if defined(__USE_FAST_MATH__) && !defined(__CUDA_PREC_DIV)
return __fdividef(a, b);
#else /* __USE_FAST_MATH__ && !__CUDA_PREC_DIV */
return a / b;
#endif /* __USE_FAST_MATH__ && !__CUDA_PREC_DIV */
}

#if defined(CUDA_FLOAT_MATH_FUNCTIONS)

static __forceinline__ double fdivide(double a, double b)
{
return (double)fdividef((float)a, (float)b);
}

#endif /* CUDA_FLOAT_MATH_FUNCTIONS */

#if defined(CUDA_DOUBLE_MATH_FUNCTIONS)

static __forceinline__ double fdivide(double a, double b)
{
return a / b;
}

I think this flag is just a remnant from a previous(e.g. 3.x in the above diff) iteration of the toolkit and, as you say, currently has no effect.

Vassil

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward