About memory usage of GpuContiguous and cached memory allocation

Wong Hang

unread,

Jan 23, 2019, 4:24:23 AM1/23/19

to theano-dev

Hi,

I have two questions here:

(1)

When I do memory profiling to my theano program, I found that most of my memory is consumed by GpuContiguous

183169800B [(4785, 4785)] v GpuContiguous(GpuContiguous.0)

183169800B [(4785, 4785)] v GpuContiguous(GpuElemwise{Composite{((i0 * exp((i1 * i2))) + (i3 * i4))}}[(0, 2)]<gpuarray>.0)

183169800B [(4785, 4785)] i GpuCholesky{lower=True, inplace=True}(GpuContiguous.0)

183169800B [(4785, 4785)] v GpuContiguous(GpuCholesky{lower=True, inplace=True}.0)

It looks like theano would try to cache each memory output on each operator and then each GpuContiguous would cache each copy of my input. Am I correct?

The optimizer of the theano WOULD NOT REMOVE extra GpuContiguous if it can be sure that the input is contiguous?

If I am sure that the input of that my own Op is C contiguous when I work on my code, I can skip gpu_contiguous in make_node()

def make_node(self,X1,X2):

ctx = infer_context_name(X1,X2)

X1 = as_gpuarray_variable(X1,ctx)

X2 = as_gpuarray_variable(X2,ctx)

# X1 = gpu_contiguous(X1)

# X2 = gpu_contiguous(X2)

and then simply add a checking in c_code(...):

if(!GpuArray_IS_C_CONTIGUOUS(&%(X1)s->ga)) {

PyErr_Format(PyExc_RuntimeError,"X1 must be C contiguous");

%(fail)s;

}

if(!GpuArray_IS_C_CONTIGUOUS(&%(X2)s->ga)) {

PyErr_Format(PyExc_RuntimeError,"X2 must be C contiguous");

%(fail)s;

}

right?

(2) Are there any way to free all the memory cache in the computational graph?

My code would compile over same combinations of Op, but with different dimension, let's say some 4000x4000, 3000x3000, 1234x1234

and then it looks (i am not sure) like whenever I compile and run the computational graph for different dimension, those memory 4000x4000, 3000x3000, 1234x1234 (created by many GPU op) are all cached and ultimately consume all of my memory:

Frédéric Bastien

unread,

Jan 23, 2019, 9:12:15 AM1/23/19

to thean...@googlegroups.com

You missed one important column. The one that print just "c", "i", or "v".

The c mean real memory allocated.

The i mean the op is tagged inplace. So there is a very high probability that the output is just a ptr to the input and we maybe changed the value.

The v mean view. This mean the output is a ptr to the full or a subset of the input but without any modification.

GpuContiguous most of the time does nothing. If the input is contiguous, it just return a ptr to the input.

But this is needed, as the following op only support input that are contiguous (most of the time, they are cudnn op). Doing that conversion inside the op is possible (and some does it), but is have other inconvenience.

So in reality, they probably do not take any memory at all.

Doing analysis of the memory usage is hard. Sadly I wasn't able to make the memory profiler output as nice as what I wanted. But at least, the information is there.

Frédéric

--

---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wong Hang

unread,

Jan 23, 2019, 9:46:38 AM1/23/19

to thean...@googlegroups.com

Thanks. your explanation is very clear.

Frédéric Bastien <frederic...@gmail.com> 於 2019年1月23日週三下午10:12寫道：

Reply all

Reply to author

Forward