where can I find Dilated Convolution in 3d?

496 views
Skip to first unread message

w.f.wi...@gmail.com

unread,
Sep 11, 2016, 6:24:59 AM9/11/16
to lasagne-users
Hi there,

I need to use 3D convolution with dilated filters
But there is only 2D from Lasagne as far as I know. Where could I find the 3D implementation?

Thanks,

Jan Schlüter

unread,
Sep 12, 2016, 6:52:30 AM9/12/16
to lasagne-users
I need to use 3D convolution with dilated filters
But there is only 2D from Lasagne as far as I know. Where could I find the 3D implementation?

You'll have to implement it first. Just copy the DilatedConv2DLayer and adapt it. Instead of AbstractConv2d_gradWeights, you'll need to use theano.sandbox.cuda.dnn.GpuDnnConv3dGradW. It shouldn't be too difficult, but let us know if you need any help with it. You can put the code directly in your own Python script.

Best, Jan

w.f.wi...@gmail.com

unread,
Sep 13, 2016, 11:42:10 PM9/13/16
to lasagne-users
Hi Jan,

So far this new 3D dilated convolution layer is work in progress becasue it returns

TypeError: __init__() got an unexpected keyword argument 'subsample

The error comes from 

op = theano.sandbox.cuda.dnn.GpuDnnConv3dGradW(
            imshp=imshp, kshp=kshp,
            subsample=self.dilation, border_mode='valid',
            filter_flip=False)
        output_size = self.output_shape[2:]

What is the right way of using theano.sandbox.cuda.dnn.GpuDnnConv3dGradW to get a dilated convolution?


Thanks

class DilatedConv3DLayer(BaseConvLayer):
    """
    lasagne.layers.DilatedConv3DLayer(incoming, num_filters, filter_size,
    dilation=(1, 1, 1), pad=0, untie_biases=False,
    W=lasagne.init.GlorotUniform(), b=lasagne.init.Constant(0.),
    nonlinearity=lasagne.nonlinearities.rectify, flip_filters=False, **kwargs)
    3D dilated convolution layer
    Performs a 3D convolution with dilated filters, then optionally adds a bias
    and applies an elementwise nonlinearity.
    Parameters
    ----------
    incoming : a :class:`Layer` instance or a tuple
        The layer feeding into this layer, or the expected input shape. The
        output of this layer should be a 4D tensor, with shape
        ``(batch_size, num_input_channels, input_rows, input_columns)``.
    num_filters : int
        The number of learnable convolutional filters this layer has.
    filter_size : int or iterable of int
        An integer or a 3-element tuple specifying the size of the filters.
    dilation : int or iterable of int
        An integer or a 3-element tuple specifying the dilation factor of the
        filters. A factor of :math:`x` corresponds to :math:`x - 1` zeros
        inserted between adjacent filter elements.
    pad : int, iterable of int, or 'valid' (default: 0)
        The amount of implicit zero padding of the input.
        This implementation does not support padding, the argument is provided
        for compatibility to other convolutional layers only.
    untie_biases : bool (default: False)
        If ``False``, the layer will have a bias parameter for each channel,
        which is shared across all positions in this channel. As a result, the
        `b` attribute will be a vector (1D).
        If True, the layer will have separate bias parameters for each
        position in each channel. As a result, the `b` attribute will be a
        3D tensor.
    W : Theano shared variable, expression, numpy array or callable
        Initial value, expression or initializer for the weights.
        These should be a 5D tensor with shape
        ``(num_input_channels, num_filters, filter_rows, filter_columns, filter_depth)``.
        Note that the first two dimensions are swapped compared to a
        non-dilated convolution.
        See :func:`lasagne.utils.create_param` for more information.
    b : Theano shared variable, expression, numpy array, callable or ``None``
        Initial value, expression or initializer for the biases. If set to
        ``None``, the layer will have no biases. Otherwise, biases should be
        a 1D array with shape ``(num_filters,)`` if `untied_biases` is set to
        ``False``. If it is set to ``True``, its shape should be
        ``(num_filters, output_rows, output_columns)`` instead.
        See :func:`lasagne.utils.create_param` for more information.
    nonlinearity : callable or None
        The nonlinearity that is applied to the layer activations. If None
        is provided, the layer will be linear.
    flip_filters : bool (default: False)
        Whether to flip the filters before sliding them over the input,
        performing a convolution, or not to flip them and perform a
        correlation (this is the default).
        This implementation does not support flipped filters, the argument is
        provided for compatibility to other convolutional layers only.
    **kwargs
        Any additional keyword arguments are passed to the `Layer` superclass.
    Attributes
    ----------
    W : Theano shared variable or expression
        Variable or expression representing the filter weights.
    b : Theano shared variable or expression
        Variable or expression representing the biases.
    Notes
    -----
    The dilated convolution is implemented as the backward pass of a
    convolution wrt. weights, passing the filters as the output gradient.
    It can be thought of as dilating the filters (by adding ``dilation - 1``
    zeros between adjacent filter elements) and cross-correlating them with the
    input. See [1]_ for more background.
    References
    ----------
    .. [1] Fisher Yu, Vladlen Koltun (2016),
           Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016.
    """
    def __init__(self, incoming, num_filters, filter_size, dilation=(1, 1, 1),
                 pad=0, untie_biases=False,
                 W=init.GlorotUniform(), b=init.Constant(0.),
                 nonlinearity=nonlinearities.rectify, flip_filters=False,
                 **kwargs):
        

        self.dilation = as_tuple(dilation, 3, int)
        super(DilatedConv3DLayer, self).__init__(
                incoming, num_filters, filter_size, 1, pad,
                untie_biases, W, b, nonlinearity, flip_filters, n=3, **kwargs)
        # remove self.stride:
        del self.stride
        # require valid convolution
        if self.pad != (0, 0, 0):
            raise NotImplementedError(
                    "DilatedConv3DLayer requires pad=0 / (0,0,0) / 'valid', but "
                    "got %r. For a padded dilated convolution, add a PadLayer."
                    % (pad,))
        # require unflipped filters
        if self.flip_filters:
            raise NotImplementedError(
                    "DilatedConv3DLayer requires flip_filters=False.")

    def get_W_shape(self):
        num_input_channels = self.input_shape[1]
        # first two sizes are swapped compared to a forward convolution
        return (num_input_channels, self.num_filters) + self.filter_size

    def get_output_shape_for(self, input_shape):
        ''' any change needed for the output shape ? '''
        batchsize = input_shape[0]
        return ((batchsize, self.num_filters) +
                tuple(conv_output_length(input, (filter-1) * dilate + 1, 1, 0)
                      for input, filter, dilate
                      in zip(input_shape[2:], self.filter_size,
                             self.dilation)))

    def convolve(self, input, **kwargs):
        # we perform a convolution backward pass wrt weights,
        # passing kernels as output gradient
        imshp = self.input_shape
        kshp = self.output_shape
        # and swapping channels and batchsize
        imshp = (imshp[1], imshp[0]) + imshp[2:]
        kshp = (kshp[1], kshp[0]) + kshp[2:]
        
        
        
        
        op = theano.sandbox.cuda.dnn.GpuDnnConv3dGradW(
            imshp=imshp, kshp=kshp,
            subsample=self.dilation, border_mode='valid',
            filter_flip=False)
        output_size = self.output_shape[2:]
        if any(s is None for s in output_size):
            output_size = self.get_output_shape_for(input.shape)[2:]
        conved = op(input.transpose(1, 0, 2, 3, 4), self.W, output_size)
        return conved.transpose(1, 0, 2, 3, 4)

Jan Schlüter

unread,
Sep 14, 2016, 4:54:07 AM9/14/16
to lasagne-users
What is the right way of using theano.sandbox.cuda.dnn.GpuDnnConv3dGradW to get a dilated convolution?

Ah yes, you need an extra step. As per the documentation (http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html#convolution-ops), GpuDnnConv3dGradW expects a convolution descriptor (GpuDnnConvDesc) -- this is common to the cuDNN convolution Ops. You can use dnn_gradweights() (https://github.com/Theano/theano/blob/7f8b43c/theano/sandbox/cuda/dnn.py#L1260-L1283) as a template. Actually, you can directly copy the code and just use GpuDnnConv3dGradW in the last line. It would be easier if Theano just used the same Ops for 2D and 3D, but it doesn't (yet).

Best, Jan

w.f.wi...@gmail.com

unread,
Sep 14, 2016, 8:30:18 PM9/14/16
to lasagne-users
Hi Jan,

The template of dnn_gradweights() leads to

def dnn_gradweight3D(img, topgrad,
                   kerns_shp,
                   border_mode='valid', subsample=(1, 1),
                   conv_mode='conv'):
    """
    GPU convolution gradient with respect to weight using cuDNN from NVIDIA.

    The memory layout to use is 'bc01', that is 'batch', 'channel',
    'first dim', 'second dim' in that order.

    FIXME parameters doc

    :warning: The cuDNN library only works with GPU that have a compute
      capability of 3.0 or higer.  This means that older GPU will not
      work with this Op.
    """

    img = gpu_contiguous(img)
    topgrad = gpu_contiguous(topgrad)
    kerns_shp = theano.tensor.as_tensor_variable(kerns_shp)
    desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
                          conv_mode=conv_mode)(img.shape, kerns_shp)
    out = gpu_alloc_empty(*kerns_shp)
    return GpuDnnConv3dGradW()(img, top, gpu_alloc_empty(*kerns.shape),
                                      desc)

-----------------------------------------------------------------------------------------------------------------------

def convolve(self, input, **kwargs):
        # we perform a convolution backward pass wrt weights,
        # passing kernels as output gradient
        imshp = self.input_shape
        kshp = self.output_shape
        # and swapping channels and batchsize
        imshp = (imshp[1], imshp[0]) + imshp[2:]
        kshp = (kshp[1], kshp[0]) + kshp[2:]
        
        
        op = dnn_gradweight3D(
            imshp=imshp, kshp=kshp,
            subsample=self.dilation, border_mode='valid',
            filter_flip=False)
        output_size = self.output_shape[2:]
        if any(s is None for s in output_size):
            output_size = self.get_output_shape_for(input.shape)[2:]
        conved = op(input.transpose(1, 0, 2, 3, 4), self.W, output_size)
        return conved.transpose(1, 0, 2, 3, 4)

 -----------------------------------------------------------------------------------------------------------------
obviously, there are a few missing parts.
1) what exactly is topgrad ?
2) where should filter flip go?    

Thanks,   
        

Jan Schlüter

unread,
Sep 16, 2016, 8:15:50 AM9/16/16
to lasagne-users
obviously, there are a few missing parts.
1) what exactly is topgrad ?

The same as the "output gradient" mentioned in the first comment of "def convolve(self, input, **kwargs)". The implementation uses a convolution backward pass wrt. weights, which is normally used for computing the gradient wrt. the weights given the gradient wrt. output ("output gradient", "topgrad") and the input of a standard, non-dilated convolutional layer. This makes it a little more difficult to read, but it's the only option for a 3D dilated convolution in Theano so far.
 
2) where should filter flip go?

filter_flip=True corresponds to conv_mode='conv' in cuDNN, and filter_flip=False corresponds to conv_mode='corr'.

Note that dnn_gradweight3D does not return the Op, but already the application of the Op. So your call should rather be something like "conved = dnn_gradweight3D(input.transpose(...), self.W, output_size)".

Best, Jan

w.f.wi...@gmail.com

unread,
Sep 16, 2016, 7:12:07 PM9/16/16
to lasagne-users
Hi Jan,

Python returns an error while checking the type of the kernels. I am not sure why kernels must be 1D. I expect kernels/ weights to be 5D (previous layer, current layer, x, y, z positions) for 3D dilated convolution.

\theano\sandbox\cuda\dnn.py", line 159, in make_node
    raise TypeError('kern must be 1D shape tensor')

TypeError: kern must be 1D shape tensor

---------------------------------------------------------------------------------------------
def dnn_gradweight3D(img, topgrad, imshp, kshp,
            subsample, border_mode='valid',
            filter_flip=False):
            
    """
    GPU convolution gradient with respect to weight using cuDNN from NVIDIA.

    The memory layout to use is 'bc01', that is 'batch', 'channel',
    'first dim', 'second dim' in that order.

    :warning: The cuDNN library only works with GPU that have a compute
      capability of 3.0 or higer.  This means that older GPU will not
      work with this Op.
    """
    if filter_flip:
        conv_mode = 'conv'
    else:
        conv_mode = 'cross'

    img = gpu_contiguous(img)
    topgrad = gpu_contiguous(topgrad)
    kerns_shp = theano.tensor.as_tensor_variable(kshp)
    desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
                          conv_mode=conv_mode)(img.shape, kerns_shp)
    out = gpu_alloc_empty(*kerns_shp)
    return GpuDnnConv3dGradW()(img, topgrad, out,
                                      desc)
----------------------------------------------------------------------------------------------------------------
    def convolve(self, input, **kwargs):
        # we perform a convolution backward pass wrt weights,
        # passing kernels as output gradient
        imshp = self.input_shape
        kshp = self.output_shape
        # and swapping channels and batchsize
        imshp = (imshp[1], imshp[0]) + imshp[2:]
        kshp = (kshp[1], kshp[0]) + kshp[2:]
        
        output_size = self.output_shape[2:]
        if any(s is None for s in output_size):
            output_size = self.get_output_shape_for(input.shape)[2:]
        
#        img, topgrad, imshp, kshp,
#            subsample, border_mode='valid',
#            filter_flip=False
        
        conved = dnn_gradweight3D(input.transpose(1, 0, 2, 3, 4), self.W, imshp, kshp,
                                  self.dilation)
        return conved.transpose(1, 0, 2, 3, 4)
---------------------------------------------------------------------

Many thanks,

Best,

Wilson

Jan Schlüter

unread,
Sep 19, 2016, 5:30:23 AM9/19/16
to lasagne-users
Python returns an error while checking the type of the kernels. I am not sure why kernels must be 1D. I expect kernels/ weights to be 5D (previous layer, current layer, x, y, z positions) for 3D dilated convolution.

\theano\sandbox\cuda\dnn.py", line 159, in make_node
    raise TypeError('kern must be 1D shape tensor')

TypeError: kern must be 1D shape tensor

The kernels must be 5D, but it complains about the kernel *shape* tensor (kshp) -- this should be a vector. At first glance, I can't see why it isn't a vector in your case, though. Can you print kshp in dnn_gradweight3D, and kerns_shp.ndim as well?

Best, Jan

w.f.wi...@gmail.com

unread,
Sep 21, 2016, 5:09:58 AM9/21/16
to lasagne-users
Hi Jan,

Good news: I am pleased to let you know that the bug has been fixed.
the network now can convolve/ cross correlate its input with either dilated convolution or normal convolution.

To test the validity of the code, I compare the convolution output between the dilated convolutional layer with a factor of 1 and convolutional layer with a stride of 1
As expected, the two agree.

Bad news: dilated convolution gives a smaller output as dilation factor rises (1,2,4.....etc).
What should the output really look like for dilated convolution beyond factor of 1?
Based on my own reading, the output should retain its resolution while the receptive filed grows exponentially as the layers get deeper. 



Thank you very much for your tips and suggestions.

Wilson

Jan Schlüter

unread,
Sep 21, 2016, 5:53:59 AM9/21/16
to lasagne-users
Bad news: dilated convolution gives a smaller output as dilation factor rises (1,2,4.....etc).

This is expected. As it is implemented now (using the backward pass wrt. weights of a convolution), it will always perform a valid convolution, that is, a convolution without any zero-padding of the input. The output size of a valid convolution is (input size - kernel size + 1). Now if you increase the dilation factor, you basically increase the kernel size by inserting zeros in between its elements. For a dilation factor of 1, a 3x3 kernel is 3x3. With a factor of 2, it's 5x5, so the output will be 2 pixels smaller. With a factor of n, it's 2*(n-1)+3, so the output will be 2*(n-1) smaller.

With this implementation, you can only avoid this by adding a PadLayer before the Deconv3DLayer. You can also integrate a pad() call directly into the get_output_for() method, based on self.pad. In this case, you will need a case distinction for self.pad to figure out the padding amount based on the dilated kernel size -- a bit similar to this early version of convolution in Lasagne, but taking the dilation into account: https://github.com/Lasagne/Lasagne/blob/ff7a27aa1e739ad7431e9819d95b76f95d14c791/lasagne/layers/conv.py#L255-L268

Full support for dilated 3D convolution with padding would require replicating https://github.com/Theano/Theano/pull/4587 for the 3D version. It's still unclear if this would be slower or faster than the backward pass trick, nobody has timed it yet.

Best, Jan

w.f.wi...@gmail.com

unread,
Sep 24, 2016, 12:49:53 AM9/24/16
to lasagne-users
Big thanks for the explanation. Unfortunately. I probably do not have time to implement such refinement at this stage. 

for those who may be interested in 3D dilated convolution, I have included the code below.
-------------------------------
in conv.py
-----------------------------

        filter_flip=True corresponds to conv_mode='conv' in cuDNN, 
        and filter_flip=False corresponds to conv_mode='corr'.
                
    def convolve(self, input, **kwargs):
        # we perform a convolution backward pass wrt weights,
        # passing kernels as output gradient
        print ('calling convolve by Dilated 3D layer')
        imshp = self.input_shape
        kshp = self.output_shape
        print ('type(kshp) = {}'.format(type(kshp)))
        # only works with int64
        kshp_64 = np.asarray(kshp)
        kshp_64 = kshp_64.astype(np.int64)
        # swapping
        channels = kshp[1]
        batchsize = kshp[0]
        
        kshp_64[0] = channels
        kshp_64[1] = batchsize
        print ('shape of kshp_64 = {}'.format(kshp_64))#kshp = (2, 1, 15, 15, 15)
        print ('shape of kshp = {}'.format(kshp))
        # and swapping channels and batchsize
        imshp = (imshp[1], imshp[0]) + imshp[2:]
        #kshp = (kshp[1], kshp[0]) + kshp[2:]
        print ('should be 1D tensor, shape of kshp = {}'.format(kshp))
       
        output_size = self.output_shape[2:]
        if any(s is None for s in output_size):
            output_size = self.get_output_shape_for(input.shape)[2:]

        conved = dnn_gradweight3D(input.transpose(1, 0, 2, 3, 4), self.W, imshp, kshp_64,
                                  self.dilation)
        return conved.transpose(1, 0, 2, 3, 4)
--------------------------------------------------------------------------------------------------------------------------------------------
in cuda dnn.py
--------------------------------------------------------------------------
def dnn_gradweight3D(img, topgrad, imshp, kshp,
            subsample, border_mode='valid',
            filter_flip=False):
    print ('now inside dnn_gradweight3D')        
    """
    GPU convolution gradient with respect to weight using cuDNN from NVIDIA.

    The memory layout to use is 'bc01', that is 'batch', 'channel',
    'first dim', 'second dim' in that order.

    :warning: The cuDNN library only works with GPU that have a compute
      capability of 3.0 or higer.  This means that older GPU will not
      work with this Op.
    """
    
    if filter_flip:
        conv_mode = 'conv'
    else:
        conv_mode = 'cross'

    img = gpu_contiguous(img)
    topgrad = gpu_contiguous(topgrad)
    #Many tensor Ops run their arguments through this function as pre-processing.
    #It passes through TensorVariable instances,
    #and tries to wrap other objects into TensorConstant.
    kerns_shp = theano.tensor.as_tensor_variable(kshp)
    
    print ('kshp = {}'.format(kshp))
    print ('type = {}'.format(type(kshp)))
    print ('kerns_shp (1D shape tensor ?) = {}'.format(kerns_shp))
    print (' kerns_shp.ndim = {}'.format(kerns_shp.ndim))
    print (' kern_shape.type.dtype (int64?)= {}'.format(kerns_shp.type.dtype))
#    desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
#                          conv_mode=conv_mode)(img.shape, kerns_shp)
#    desc = GpuDnnConvDesc(border_mode='valid', subsample=(1, 1),
#                              conv_mode='cross', precision=precision)(img.shape,
#                                                                      out.shape)
#    
    desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample,
                          conv_mode=conv_mode)(img.shape, kerns_shp)
    out = gpu_alloc_empty(*kerns_shp)
    return GpuDnnConv3dGradW()(img, topgrad, out,
                                      desc)

-----------------------------------------------------------------------------
test code
--------------------------------------------------------------------------
    minibatch_size = 2
    tree_side = 17
    num_filters = 1
    input_data = np.random.rand(minibatch_size, 1, tree_side,tree_side,tree_side)
    # GPU takes floatX/ 32
    input_data = input_data.astype(theano.config.floatX)
    net = {}
    
    net['input'] = L.layers.InputLayer((minibatch_size,
                                            1,
                                            tree_side,tree_side,tree_side),
                                           input_var=input_data)
 
        
    net['normal'] = lasagne.layers.dnn.Conv3DDNNLayer(net['input'], num_filters,
                                                             filter_size=(3,3,3),
                                                        W=L.init.GlorotUniform(),
                                                        b=L.init.Constant(0.),
                                                        nonlinearity=rectify,
                                                            pad='valid')
    # convole input
    convol_output = L.layers.get_output(net['normal']).eval()
    # get filter and bias
    fil = lasagne.layers.get_all_params(net['normal'])
    # feed the same filter and bias
    net['dilated'] = L.layers.DilatedConv3DLayer(net['input'], num_filters,
                                                             filter_size=(3,3,3),
                                                            dilation=(2, 2, 2),
                                                        W=fil[0].get_value(),
                                                        b=fil[1].get_value(),
                                                        nonlinearity=rectify,
                                                            pad=0)
    
    dilated_output = L.layers.get_output(net['dilated']).eval()
    
    # expect outputs are equal
    # becasue dilated convolution of factor 1 = normal convolution
    print (np.allclose(dilated_output, convol_output, atol=1e-07))
-----------------------------------------------------------------------------------------------

Jan Schlüter

unread,
Sep 26, 2016, 2:07:43 PM9/26/16
to lasagne-users
for those who may be interested in 3D dilated convolution, I have included the code below.

Thank you for posting it! Just in case it's not obvious, you can also put all this code in a file you control yourself and can import. Then you don't need to modify the Theano and Lasagne code, and can easily upgrade to more recent versions of Theano and Lasagne.

Best, Jan

happyic...@gmail.com

unread,
Jun 2, 2017, 10:26:08 AM6/2/17
to lasagne-users
Thanks for sharing your implementation. But why not use  T.nnet.abstract_conv.AbstractConv3d_gradWeights to calculate op in convolve function as DilatedConv3DLayer?

Jan Schlüter

unread,
Jun 2, 2017, 2:01:30 PM6/2/17
to lasagne-users, happyic...@gmail.com
But why not use  T.nnet.abstract_conv.AbstractConv3d_gradWeights to calculate op in convolve function as DilatedConv3DLayer?

AbstractConv3d was not available yet when we discussed this; it was added a few weeks later (https://github.com/Theano/Theano/pull/4862). It makes sense to use it now, yes.

Best, Jan
Message has been deleted

happyic...@gmail.com

unread,
Jun 3, 2017, 9:46:54 AM6/3/17
to lasagne-users, happyic...@gmail.com
Thanks a lot and I have already used it. But I just add PadLayer and SliceLayer outside the implementation of DilatedConv3DLayer to keep the output size the same as input size. Do you have any better suggestions?

Best,
Jason

Jan Schlüter

unread,
Jun 6, 2017, 5:07:53 AM6/6/17
to lasagne-users, happyic...@gmail.com
But I just add PadLayer and SliceLayer outside the implementation of DilatedConv3DLayer to keep the output size the same as input size. Do you have any better suggestions?

Yes, you can use the abstract conv3d forward pass; it should also support dilation. This was added to Theano much later than the backward-pass dilation convolution in Lasagne. I haven't had time to update the DilatedConv2DLayer yet, help is welcome (https://github.com/Lasagne/Lasagne/issues/716).

Best, Jan
Reply all
Reply to author
Forward
0 new messages