Reuse memory between calls when develop new GPU op

12 views
Skip to first unread message

Wong Hang

unread,
Jan 22, 2019, 9:35:09 PM1/22/19
to theano-dev
Hi,

In the document "Extending Theano with a C op", http://deeplearning.net/software/theano/extending/extending_theano_c.html

the example checks if the storage exists and if it exists and have the same shape, it will reuse it:

        // Validate that the output storage exists and has the same
        // dimension as x.
        if (NULL == %(z)s ||
            PyArray_DIMS(%(x)s)[0] != PyArray_DIMS(%(z)s)[0])
        {
            /* Reference received to invalid output variable.
            Decrease received reference's ref count and allocate new
            output variable */
            Py_XDECREF(%(z)s);
            %(z)s = (PyArrayObject*)PyArray_EMPTY(1,
                                                PyArray_DIMS(%(x)s),
                                                PyArray_TYPE(%(x)s),
                                                0);

            if (!%(z)s) {
                %(fail)s;
            }
        }

Also in the document "Creating a new Op: Python implementation", http://deeplearning.net/software/theano/extending/extending_theano.html

It also said the following:

output_storage. A function Mode may allow output_storage elements to persist between evaluations, or it may reset output_storage cells to hold a value of None. It can also pre-allocate some memory for the op to use. This feature can allow perform to reuse memory between calls

However, in the "Extending Theano with a GPU Op", http://deeplearning.net/software/theano/extending/extending_theano_gpu.html

The example simply calls:
Py_CLEAR(%(z)s);
without any checking if a pygpu array buffer exists. Does the reuse memory between calls trick exist when I develop a GPU Op?

Thanks.




Frédéric Bastien

unread,
Jan 23, 2019, 9:06:18 AM1/23/19
to thean...@googlegroups.com
Hi,

The reuse still exist on the GPU. We just wanted to make the example more simple to understood.

But it isn't as much important on the GPU, as we cache the GPU malloc. So each allocation we do is cheaper as they aren't real allocation. So unless you think this is a bottleneck, I would suggest you to do as the example.

Just for your information, did you saw this post that is more then 1 year old?

Frédéric

--

---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wong Hang

unread,
Jan 23, 2019, 9:46:29 AM1/23/19
to thean...@googlegroups.com
Thank you very much. I got it now.

Yes. I know that theano's development was stopped. 
Other framework are just too focus on NN and theano gives me the biggest freedom to extend easily and work between other python library (e.g. pycuda)
HIPS/autograd is good, but it does not support GPU.

As long as it is open source I can still use it.


Frédéric Bastien <frederic...@gmail.com> 於 2019年1月23日 週三 下午10:06寫道:
Reply all
Reply to author
Forward
0 new messages