ValueError: matrices are not aligned

114 views
Skip to first unread message

Rijuban Rangslang

unread,
Oct 9, 2015, 10:26:07 AM10/9/15
to theano-users
Hi ,

I am trying to model a sparse RBM so I tried some dropout. The code seems to build the model but throws an error which I cannot figure out why. The error is as follows.

 z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: matrices are not aligned
Apply node that caused the error: dot(Elemwise{mul,no_inplace}.0, W)
Toposort index: 45
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(60, 2000), (180, 2000)]
Inputs strides: [(16000, 8), (16000, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{add,no_inplace}(dot.0, DimShuffle{x,0}.0)]]

Backtrace when the node is created:
  File "GRBM_momemtum_sparsity.py", line 257, in propup
    pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias

The error is in this particular line. I have as my input a matrix with columns 180, so my input is 180 and output 61. I take my batch size as 60. Also, my hidden layers is [2000,2000]
The part of the code from where the error might occur is

 
    #def __init__(self,numpy_rng,layer_sizes,dropout_rates,activations,use_bias,theano_rng = None):
     def __init__(self,numpy_rng,layer_sizes,dropout_rates,activations,use_bias,theano_rng = None):
        #rectified_linear_activation = lambda x: T.maximum(0.0, x)
        self.x = T.matrix('x')
        # Set up all the hidden layers
        weight_matrix_sizes = zip(layer_sizes, layer_sizes[1:])
        self.layers = []
        #self.n_layers = len(hidden_layer_sizes)
        self.dropout_layers = []
        self.rbm_layers = []
        next_layer_input = self.x
        #first_layer = True
        if theano_rng is None:
            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))


        # dropout the input
        next_dropout_layer_input = _dropout_from_layer(numpy_rng, next_layer_input, p=dropout_rates[0])
        layer_counter = 0   
        
        for n_ins, n_out in weight_matrix_sizes[:-1]:
            # Reuse the paramters from the dropout layer here, in a different
            # path through the graph.
            next_layer = HiddenLayer(numpy_rng=numpy_rng,
                    input=next_layer_input,
                    activation=activations[layer_counter],
                    # scale the weight matrix W with (1-p)
                    n_ins=n_ins, n_out=n_out,
                    use_bias=use_bias)
            self.layers.append(next_layer)
            next_layer_input = next_layer.output
            next_dropout_layer = DropoutHiddenLayer(numpy_rng=numpy_rng,
                    input=next_dropout_layer_input,
                    activation=activations[layer_counter],
                    n_ins=n_ins, n_out=n_out,use_bias=use_bias,
                    W=next_layer.W*(1 - dropout_rates[-1]),
                    b=next_layer.b,
                    dropout_rate=dropout_rates[layer_counter + 1])
            self.dropout_layers.append(next_dropout_layer)
            next_dropout_layer_input = next_dropout_layer.output
       
            #first_layer = False
            layer_counter += 1
        # Construct an RBM that shared weights with this layer
            rbm_layer = RBM(numpy_rng=numpy_rng,
                            theano_rng=theano_rng,
                            input=next_dropout_layer_input,
                            n_visible=n_ins,
                            n_hidden=n_out,
                            W=next_layer.W,
                            hbias=next_layer.b)
            self.rbm_layers.append(rbm_layer)
        # Set up the output layer
        #n_ins, n_out = weight_matrix_sizes[-1]
        dropout_output_layer = LogisticRegression(
                input=next_dropout_layer_input,
                n_ins=n_ins, n_out=n_out)
        self.dropout_layers.append(dropout_output_layer)

        # Again, reuse paramters in the dropout output.
        output_layer = LogisticRegression(
            input=next_layer_input,
            # scale the weight matrix W with (1-p)
            W=dropout_output_layer.W * (1 - dropout_rates[-1]),
            b=dropout_output_layer.b,
            n_ins=n_ins, n_out=n_out)
        self.layers.append(output_layer)

Kindly help.

Rijuban Rangslang

unread,
Oct 9, 2015, 10:35:41 AM10/9/15
to theano-users
Ok so I have managed for it to throw the error anymore but again I am back to square one. The present error it is showing is TypeError: ('update target must be a SharedVariable', Elemwise{mul,no_inplace}.0). I think I have all the weights defined as shared variables.

KIndly help

Daniel Renshaw

unread,
Oct 9, 2015, 11:39:57 AM10/9/15
to theano...@googlegroups.com
The 'update target must be a SharedVariable' error suggests you're doing something like this:

w = theano.shared(...)
w = 2 * w
# do something with w
updates = [(w, w - 0.1 * grad)]

i.e. the w 'target' is in fact a symbolic expression that multiplies the weights by 2. Instead, you need to do something like this:

w = theano.shared(...)
w_alt = 2 * w
# do something with w
updates = [(w, w_alt - 0.1 * grad)]

where now the target is the original shared variable itself.

Daniel


--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rijuban Rangslang

unread,
Oct 10, 2015, 8:06:30 AM10/10/15
to theano-users
HI Daniel,

In the process of finding the gradient, should the gradient be wrt to w_alt or w.
I have as my object creation of RBM like this .

 next_dropout_layer = DropoutHiddenLayer(numpy_rng=numpy_rng,
                    input=next_dropout_layer_input,
                    activation=activations[layer_counter],
                    n_ins=n_ins, n_out=n_out,use_bias=use_bias,
                    W=next_layer.W*(1 - dropout_rates[layer_counter]),
                    b=next_layer.b*(1 - dropout_rates[layer_counter]),

                    dropout_rate=dropout_rates[layer_counter + 1])
            self.dropout_layers.append(next_dropout_layer)
            next_dropout_layer_input = next_dropout_layer.output
 rbm_layer = RBM(numpy_rng=numpy_rng,
                            theano_rng=theano_rng,
                            input=next_dropout_layer_input,
                            n_visible=n_ins,
                            n_hidden=n_out,
                            W=next_layer.W*(1 - dropout_rates[layer_counter]),
                            hbias=next_layer.b*(1 - dropout_rates[layer_counter]))
            self.rbm_layers.append(rbm_layer)
where the input to the RBM is masked accordingly as per the dropout rate. Also I have tried to mask the weights and biases. Now in the class RBM there is a definition for W and vbias and hbias.
self.n_visible = n_visible
        self.n_hidden = n_hidden

        if numpy_rng is None:
            # create a number generator
            numpy_rng = numpy.random.RandomState(1234)

        if theano_rng is None:
            theano_rng = RandomStreams(nunpy_rng.randint(2 ** 30))

        if W is None:
            # W is initialized with `initial_W` which is uniformely
            # sampled from -4*sqrt(6./(n_visible+n_
            # )) and
            # 4*sqrt(6./(n_hidden+n_visible)) the output of uniform if
            # converted using asarray to dtype theano.config.floatX so
            # that the code is runable on GPU
            initial_W = numpy.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                    size=(n_visible, n_hidden)
                ),
                dtype=theano.config.floatX
            )
            # theano shared variables for weights and biases
            W = theano.shared(value=initial_W, name='W', borrow=True)

        if hbias is None:
            # create shared variable for hidden units bias
            hbias = theano.shared(
                value=numpy.zeros(
                    n_hidden,
                    dtype=theano.config.floatX
                ),
                name='hbias',
                borrow=True
            )

        if vbias is None:
            # create shared variable for visible units bias
            vbias = theano.shared(
                value=numpy.zeros(
                    n_visible,
                    dtype=theano.config.floatX
                ),
                name='vbias',
                borrow=True
            )

        # initialize input layer for standalone RBM or layer0 of DBN
        self.input = input
        if not input:
            self.input = T.matrix('input')
        self.W = W
        self.vbias = vbias
        self.hbias = hbias
        self.theano_rng = theano_rng
        # **** WARNING: It is not a good idea to put things in this list
        # other than shared variables created in this function.
        self.params = [self.W, self.hbias, self.vbias]
And the update section is something like this     

        cost = T.mean(self.free_energy(self.input)) - T.mean(
            self.free_energy(chain_end))
        # We must not compute the gradient through the gibbs sampling
        gparams = T.grad(cost, self.params, consider_constant=[chain_end])
        # end-snippet-3 start-snippet-4
        # constructs the update dictionary
        for gparam, param,params_alt in zip(gparams, self.params,self.params_alt):
            # make sure that the learning rate is of the right dtype
            updates[param] = params - gparam * T.cast(
                learning_rate,
                dtype=theano.config.floatX
            )
 Kindly help

Daniel Renshaw

unread,
Oct 13, 2015, 7:02:01 AM10/13/15
to theano...@googlegroups.com
The gradients you're probably interested in are with respect to the parameters, so that would be `w` in my code sample, not `w_alt`.

Rijuban Rangslang

unread,
Oct 13, 2015, 10:03:16 AM10/13/15
to theano-users
Hi Daniel,

Actually I have removed the scaling associated with the W as in W=next_layer.W*(1 - dropout_rates[-1]). This seems to have solved the problem as now W remains a shared variable. However, when I am doing this it outputs an error the one that I got initially ie
 z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: matrices are not aligned
Apply node that caused the error: dot(Elemwise{mul,no_inplace}.
0, W)
Toposort index: 45
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(60, 2000), (180, 2000)]
Inputs strides: [(16000, 8), (16000, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{add,no_inplace}(dot.0, DimShuffle{x,0}.0)]]

Backtrace when the node is created:
  File "GRBM_momemtum_sparsity.py", line 257, in propup
    pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias

My input remains as batch size 60 and layer wise [180,2000,2000,61]. I am not sure how the above error is generated as because the input should be a (60,180) size matrix but it seems to show that its (60,2000).

Kindly help

Daniel Renshaw

unread,
Oct 13, 2015, 10:07:55 AM10/13/15
to theano...@googlegroups.com
I can't spare the time to do a full code review, especially since this is not executable code.

I suggest you take a look at the facilities Theano provides for debugging (in concert with those provided by Python itself). In particular, the "test values" mechanism is very useful for tracking down shape mismatch problems.

Daniel

Frédéric Bastien

unread,
Oct 13, 2015, 7:02:41 PM10/13/15
to theano-users


Le 13 oct. 2015 10:03, "Rijuban Rangslang" <rijuban...@gmail.com> a écrit :
>
> Hi Daniel,
>
> Actually I have removed the scaling associated with the W as in W=next_layer.W*(1 - dropout_rates[-1]). This seems to have solved the problem as now W remains a shared variable. However, when I am doing this it outputs an error the one that I got initially ie
>  z[0] = numpy.asarray(numpy.dot(x, y))
> ValueError: matrices are not aligned
> Apply node that caused the error: dot(Elemwise{mul,no_inplace}.
> 0, W)
> Toposort index: 45
> Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
> Inputs shapes: [(60, 2000), (180, 2000)]
> Inputs strides: [(16000, 8), (16000, 8)]
> Inputs values: ['not shown', 'not shown']
> Outputs clients: [[Elemwise{add,no_inplace}(dot.0, DimShuffle{x,0}.0)]]
>
> Backtrace when the node is created:
>   File "GRBM_momemtum_sparsity.py", line 257, in propup
>     pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias

This is the line that created the problem. You cut the message so I can't tell if the problem is with the add or the dot. Maybe the shapes in the error message will help you.

Reply all
Reply to author
Forward
0 new messages