Reuse memory between calls when develop new GPU op

Wong Hang

unread,

Jan 22, 2019, 9:35:09 PM1/22/19

to theano-dev

Hi,

In the document "Extending Theano with a C op", http://deeplearning.net/software/theano/extending/extending_theano_c.html

the example checks if the storage exists and if it exists and have the same shape, it will reuse it:

        // Validate that the output storage exists and has the same
        // dimension as x.
        if (NULL == %(z)s ||
            PyArray_DIMS(%(x)s)[0] != PyArray_DIMS(%(z)s)[0])
        {
            /* Reference received to invalid output variable.
            Decrease received reference's ref count and allocate new
            output variable */
            Py_XDECREF(%(z)s);
            %(z)s = (PyArrayObject*)PyArray_EMPTY(1,
                                                PyArray_DIMS(%(x)s),
                                                PyArray_TYPE(%(x)s),
                                                0);

            if (!%(z)s) {
                %(fail)s;
            }
        }

Also in the document "Creating a new Op: Python implementation", http://deeplearning.net/software/theano/extending/extending_theano.html

It also said the following:

output_storage. A function Mode may allow output_storage elements to persist between evaluations, or it may reset output_storage cells to hold a value of None. It can also pre-allocate some memory for the op to use. This feature can allow perform to reuse memory between calls

However, in the "Extending Theano with a GPU Op", http://deeplearning.net/software/theano/extending/extending_theano_gpu.html

The example simply calls:

Py_CLEAR(%(z)s);

without any checking if a pygpu array buffer exists. Does the reuse memory between calls trick exist when I develop a GPU Op?

Thanks.

Frédéric Bastien

unread,

Jan 23, 2019, 9:06:18 AM1/23/19

to thean...@googlegroups.com

Hi,

The reuse still exist on the GPU. We just wanted to make the example more simple to understood.

But it isn't as much important on the GPU, as we cache the GPU malloc. So each allocation we do is cheaper as they aren't real allocation. So unless you think this is a bottleneck, I would suggest you to do as the example.

Just for your information, did you saw this post that is more then 1 year old?

https://groups.google.com/forum/#!msg/theano-announce/PiH4p7NqZ60/34GnHsnwBgAJ;context-place=forum/theano-users

Frédéric

--

---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wong Hang

unread,

Jan 23, 2019, 9:46:29 AM1/23/19

to thean...@googlegroups.com

Thank you very much. I got it now.

Yes. I know that theano's development was stopped.

Other framework are just too focus on NN and theano gives me the biggest freedom to extend easily and work between other python library (e.g. pycuda)

HIPS/autograd is good, but it does not support GPU.

As long as it is open source I can still use it.

Frédéric Bastien <frederic...@gmail.com> 於 2019年1月23日週三下午10:06寫道：

Reply all

Reply to author

Forward