Neural networks for regression, almost directly from the tutorial documentation, but get a ValueError due to input dimension mismatch

Daniel Seita

ulest,

20. jan. 2016, 20:44:4620.01.2016

til theano-users

Hello everyone. I am trying to do neural networks for regression. In other words, I have data from R^n to R. I would like to try and use a simple fully connected network, with one hidden layer, and one output node (since the problem is regression). I am building off of the first two tutorials (Logistic Regression and Multilayer Perceptron) from this page. Unfortunately, I'm running into a ValueError, and due to the way Theano works, I can't just print the data in places of the code. I feel there is something really basic I'm missing about Theano.

Here is my code. I'm sorry that it's a bit long (about 250 lines) but most of it is directly from the two theano tutorials I linked to earlier. I put in some artificial data at the bottom so you can copy the code and run it without needing anything else (well, other than theano).

The LinearRegresion class is the same as the LogisticRegression class, except I changed the method of computing errors, following this CrossValidated post., so I am using

T.mean((self.pred_y_given_x - y) ** 2).

The HiddenLayer class should be exactly the same as it was in the MLP documentation.

The MLP class is also similar, except that it uses the LinearRegression layer instead of the LogisticRegression layer.

For simplicity, I am not using any testing set. I am only using a training set, and a validation set. When I prepared the training, I set the 'y' symbolic variable to be a 'vector', NOT an 'ivector' as was the case with LogisticRegression. I also did not cast the output to be int32 in the nested 'shared_dataset' function.

import os
import numpy as np
import theano
import theano.tensor as T
import timeit


class LinearRegression(object):
    """ 
    The Linear Regression layer for the final output of the MLP. It's similar to LogisticRegression,
    but we will only have one output layer.  We also do not need the 'errors' method because this is
    regression, not classification.
    """

    def __init__(self, input, n_in, n_out):
        """ 
        :input: A symbolic variable that describes the input of the architecture (one mini-batch).
        :n_in: The number of input units, the dimension of the data space.
        :n_out: The number of output units, the dimension of the labels (here it's one).
        """
        self.W = theano.shared(value = np.zeros( (n_in, n_out), dtype=theano.config.floatX ),
                               name = 'W',
                               borrow = True)
        self.b = theano.shared(value = np.zeros( (n_out,), dtype=theano.config.floatX ),
                               name = 'b',
                               borrow = True)
        self.pred_y_given_x = T.dot(input, self.W) + self.b
        self.params = [self.W, self.b]
        self.input = input

    def squared_errors(self, y):
        """ Returns the mean of squared errors of the linear regression on this data. """
        return T.mean((self.pred_y_given_x - y) ** 2)


class HiddenLayer(object):
    """
    Hidden Layer class for a Multi-Layer Perceptron. This is exactly the same as the reference
    code from the documentation, except for T.sigmoid instead of T.tanh.
    """

    def __init__(self, rng, input, n_in, n_out, W=None, b=None, activation=T.tanh):
        """
        :rng: A random number generator for initializing weights.
        :input: A symbolic tensor of shape (n_examples, n_in).
        :n_in: Dimensionality of input.
        :n_out: Number of hidden units.
        :activation: Non-linearity to be applied in the hidden layer.
        """

        # W is initialized with W_values, according to the "Xavier method".
        if W is None:
            W_values = np.asarray(
                rng.uniform(
                    low = -np.sqrt(6. / (n_in + n_out)),
                    high = np.sqrt(6. / (n_in + n_out)),
                    size = (n_in, n_out)
                ),
                dtype=theano.config.floatX
            )
            if activation == T.nnet.sigmoid:
                W_values *= 4
            W = theano.shared(value=W_values, name='W', borrow=True)

        if b is None:
            b_values = np.zeros((n_out,), dtype=theano.config.floatX)
            b = theano.shared(value=b_values, name='b', borrow=True)

        self.W = W
        self.b = b
        lin_output = T.dot(input, self.W) + self.b
        self.output = lin_output if activation is None else activation(lin_output)

        # Miscellaneous stuff
        self.params = [self.W, self.b]
        self.input = input


class MLP(object):
    """ Multi-Layer Perceptron class. It consists of a HiddenLayer and a LinearRegression layer. """

    def __init__(self, rng, input, n_in, n_hidden, n_out):
        """
        :rng: A random number generator for initializing weights.
        :input: Symbolic variable that describes the architecture of one mini-batch.
        :n_in: Dimension of each data point, i.e., the total number of features.
        :n_hidden: The number of hidden units.
        :n_out: The dimension of the space labels lie; here it's a scalar due to regression.
        """

        # One hidden layer with sigmoid activations, connected to the final LinearRegression layer
        self.hiddenLayer = HiddenLayer(rng = rng,
                                       input = input,
                                       n_in = n_in,
                                       n_out = n_hidden,
                                       activation = T.tanh)

        # The logistic regression layer gets as input the hidden units of the linear reg. layer
        self.linRegressionLayer = LinearRegression(input = self.hiddenLayer.output,
                                                   n_in = n_hidden,
                                                   n_out = n_out)

        # Two norms, along with sum of squares loss function (output of LinearRegression layer)
        self.L1 = abs(self.hiddenLayer.W).sum() + abs(self.linRegressionLayer.W).sum()
        self.L2_sqr = (self.hiddenLayer.W ** 2).sum() + (self.linRegressionLayer.W ** 2).sum()
        self.squared_errors = self.linRegressionLayer.squared_errors

        # Miscellaneous
        self.params = self.hiddenLayer.params + self.linRegressionLayer.params
        self.input = input


def convert_data_theano(dataset):
    """ 
    Copying this from documentation online, including some of the nested 'shared_dataset' function,
    but I'm also returning the number of features, since it's easiest to detect that here.
    """
    train_set, valid_set = dataset[0], dataset[1]
    assert (train_set[0].shape)[1] == (valid_set[0].shape)[1], \
        "Number of features for train,val do not match: {} and {}.".format(train_set.shape[1],valid_set.shape[1])
    num_features = (train_set[0].shape)[1]

    def shared_dataset(data_xy, borrow=True):
        """ 
        Function that loads the dataset into shared variables. It is DIFFERENT from the online
        documentation since we can keep shared_y as floats; we won't be needing them as indices.
        """
        data_x, data_y = data_xy
        shared_x = theano.shared(np.asarray(data_x, dtype=theano.config.floatX), borrow=borrow)
        shared_y = theano.shared(np.asarray(data_y, dtype=theano.config.floatX), borrow=borrow)
        return shared_x, shared_y

    train_set_x, train_set_y = shared_dataset(train_set)
    valid_set_x, valid_set_y = shared_dataset(valid_set)
    rval = [(train_set_x,train_set_y), (valid_set_x,valid_set_y)]
    return rval,num_features


def do_mlp(dataset, learning_rate=0.01, L1_reg=0.00, L2_reg=0.0001, n_epochs=1000, batch_size=30, n_hidden=500):
    """
    This will run the code for a fully-connected neural network with one hidden layer. The only
    thing we need is the data; all other values can be set at defaults.

    :dataset: A tuple (t,v) where t is itself a tuple of (train_data,train_values) and similarly for
        v, except it stands for the validation set.
    """

    # Get data into shared, correct Thenao format, and compute number of mini-batches.
    datasets, num_features = convert_data_theano(dataset) 
    train_set_x, train_set_y = datasets[0]
    valid_set_x, valid_set_y = datasets[1]
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
    n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size

    ######################
    # BUILD ACTUAL MODEL #
    ######################
    print '... building the model'
    
    # Symbolic variables for the data. 'index' is the index to a mini-batch.
    index = T.lscalar()
    x = T.matrix('x')
    y = T.vector('y') # This is NOT ivector because we have continuous outputs!

    # Construct my own MLP class based on similar code, but using a single continuous output, so n_out = 1.
    rng = np.random.RandomState(1234)
    classifier = MLP(rng = rng,
                     input = x,
                     n_in = num_features,
                     n_hidden = n_hidden,
                     n_out = 1)

    # The cost function, symbolically, is DIFFERENT from their (logistic regression) example.
    cost = classifier.squared_errors(y) + L1_reg * classifier.L1 + L2_reg * classifier.L2_sqr

    # Compiling a Theano function that computes the error by the model on a minibatch. Since we
    # don't have simple classification, just return the classifier.squared_errors().
    validate_model = theano.function(inputs = [index],
                                     outputs = classifier.squared_errors(y),
                                     givens = { 
                                        x: valid_set_x[index * batch_size:(index + 1) * batch_size],
                                        y: valid_set_y[index * batch_size:(index + 1) * batch_size]
                                     })

    # Compute the gradient of the cost w.r.t. parameters; code matches the documentation.
    gparams = [T.grad(cost, param) for param in classifier.params]

    # How to update the model parameters as list of (variable, update_expression) pairs.
    updates = [ (param, param - learning_rate * gparam) for param, gparam in zip(classifier.params, gparams)]

    # Compiling a Theano function `train_model` that returns the cost AND updates parameters.
    train_model = theano.function(inputs = [index],
                                  outputs = cost,
                                  updates = updates,
                                  givens = {
                                      x: train_set_x[index * batch_size:(index + 1) * batch_size],
                                      y: train_set_y[index * batch_size:(index + 1) * batch_size]
                                  })

    ###############
    # TRAIN MODEL #
    ###############
    print '... training'

    # Early stopping parameters (might have to tweak)
    patience = 1000
    patience_increase = 2
    improvement_threshold = 0.995
    validation_frequency = min(n_train_batches, patience/2)

    # Other variables of interest
    best_valid_loss = np.inf
    best_iter = 0
    test_score = 0.
    start_time = timeit.default_timer()
    epoch = 0
    done_looping = False

    while (epoch < n_epochs) and (not done_looping):
        epoch += 1

        for minibatch_index in xrange(n_train_batches):

            # Training.
            minibatch_avg_cost = train_model(minibatch_index)
            iter = (epoch - 1) * n_train_batches + minibatch_index

            # Evaluate on validation set
            if (iter + 1) % validation_frequency == 0:
                validation_losses = [validate_model(i) for i in xrange(n_valid_batches)]
                this_validation_loss = numpy.mean(validation_losses)
                print "epoch {}, minibatch {}/{}, validation MAE {:.5f}".format(epoch,
                                minibatch_index + 1, n_train_batches, this_validation_loss)

                # If best valid so far, improve patience and update the 'best' variables.
                if this_validation_loss < best_validation_loss:
                    if this_validation_loss < best_validation_loss * improvement_threshold:
                        patience = max(patience, iter * patience_increase)
                    best_validation_loss = this_validation_loss
                    best_iter = iter

            if patience <= iter:
                done_looping = True
                break

    end_time = timeit.default_timer()
    print "Optimization complete."


if __name__ == '__main__':
    """ Let's just test it with 5000 training and 1000 validation instances, with 500 features. """
    X_train = np.random.rand(5000,500)
    y_train = np.random.rand(500,)
    X_val = np.random.rand(1000,500)
    y_val = np.random.rand(500,)
    data = [ (X_train,y_train) , (X_val,y_val) ]
    do_mlp(dataset=data)

Here is my output. The error I am getting is an input dimension mismatch. I've tweaked with the values and it looks like input[0].shape[1] is n_out, and input[2].shape[1] = 30 is the batch size, but I have no idea why those should be equal.

dhcp-46-165:theano_deep_learning danielseita$ python neural_network_regression.py 
... building the model
... training
Traceback (most recent call last):
  File "neural_network_regression.py", line 257, in <module>
    do_mlp(dataset=data)
  File "neural_network_regression.py", line 225, in do_mlp
    minibatch_avg_cost = train_model(minibatch_index)
  File "/Users/danielseita/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/Users/danielseita/.local/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/Users/danielseita/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[2].shape[1] = 30)
Apply node that caused the error: Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)](Dot22.0, InplaceDimShuffle{x,0}.0, InplaceDimShuffle{x,0}.0)
Toposort index: 28
Inputs types: [TensorType(float32, matrix), TensorType(float32, row), TensorType(float32, row)]
Inputs shapes: [(30, 1), (1, 1), (1, 30)]
Inputs strides: [(4, 4), (4, 4), (120, 4)]
Inputs values: ['not shown', array([[ 0.]], dtype=float32), 'not shown']
Outputs clients: [[Elemwise{sqr,no_inplace}(Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)].0), Elemwise{Composite{((i0 * i1) / i2)}}[(0, 1)](TensorConstant{(1, 1) of 2.0}, Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)].0, Elemwise{mul,no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Please let me know if I can clarify anything in this question.

Daniel Seita

ulest,

20. jan. 2016, 20:48:1420.01.2016

til theano-users

Ah, I noticed a few dumb errors in the code:

- The main method, the y_train and y_val should be of length 5000 and 1000, not 500
- The comments for the HiddenLayer are old; I am using T.tanh, not T.nnet.sigmoid

But fixing these still results in the input dimension mismatch ...

Daniel Seita

ulest,

22. jan. 2016, 11:28:1122.01.2016

til theano-users

I think I solved it ... one has to extract the sole element in the prediction in order to treat it as a true scalar, e.g., from this other post, we have to use:

self.p_y_given_x = T.dot(input, self.W) + self.b
self.y_pred = self.p_y_given_x[:,0]

In other words, it's basically distinguishing between [x] and x in Python, and we want the latter, but my code did the former.

This solved the problem I was having.

On Wednesday, January 20, 2016 at 5:44:46 PM UTC-8, Daniel Seita wrote:

...

Heinz Hemken

ulest,

22. jan. 2016, 14:12:4522.01.2016

til theano-users

Daniel,

Do you now have running code you might share?

Thanks!

Daniel Seita

ulest,

22. jan. 2016, 17:20:4722.01.2016

til theano-users

Yes, what I have in my original post, along with the change:

self.p_y_given_x = T.dot(input, self.W) + self.b
self.y_pred = self.p_y_given_x[:,0]

Then use self.y_pred in the "squared_errors" function. That should work.

Heinz Hemken

ulest,

23. jan. 2016, 18:19:1623.01.2016

til theano-users

Daniel,

I must not be getting something. I've attached the code updated with the changes you mention, and get this:

$ THEANO_FLAGS='floatX=float32,device=gpu0,nvcc.fastmath=True' python mlp-linreg-daniel-seita.py
Using gpu device 0: GeForce GTX TITAN X

... building the model
... training
Traceback (most recent call last):

File "mlp-linreg-daniel-seita.py", line 261, in <module>
    do_mlp(dataset=data)
File "mlp-linreg-daniel-seita.py", line 229, in do_mlp
    minibatch_avg_cost = train_model(minibatch_index)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 20, but the output's size on that axis is 30.
Apply node that caused the error: GpuElemwise{Sub}[(0, 0)](GpuSubtensor{::, int64}.0, GpuSubtensor{int64:int64:}.0)
Inputs types: [CudaNdarrayType(float32, vector), CudaNdarrayType(float32, vector)]
Inputs shapes: [(30,), (20,)]
Inputs strides: [(1,), (1,)]
Inputs values: ['not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Is it running correctly for you?

Thanks!

mlp-linreg-daniel-seita.py

Heinz Hemken

ulest,

23. jan. 2016, 18:44:1723.01.2016

til theano-users

The attached file runs without errors, but validation does not appear to be occurring.

mlp-linreg-daniel-seita.py

Heinz Hemken

ulest,

23. jan. 2016, 19:33:5223.01.2016

til theano-users

Validation is occurring, although I think I have the break criterion wrong here:

for i in xrange(n_valid_batches):
    if i >= n_valid_batches / batch_size:
        print "no more validation mini-batches"
        break
    validation_loss = validate_model(i)
    print "i %d, validation_loss %s" % (i, validation_loss)
    validation_losses.append(validation_loss)

mlp-linreg-daniel-seita.py

Daniel Seita

ulest,

23. jan. 2016, 19:45:1123.01.2016

til theano...@googlegroups.com

Heinz,

Here, I'm attaching some code that should work. The data is complete garbage (defined at the bottom of the code) but the code runs without errors, using the fix I mentioned. I'm not sure what happened with your code but hopefully this will be useful.

--

---
You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/RWnbnfbRiqU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

neural_network_regression.py

Heinz Hemken

ulest,

23. jan. 2016, 19:49:4023.01.2016

til theano-users

OK, that ran very nicely! That should keep me busy for a while.

Thanks!

Heinz Hemken

ulest,

2. feb. 2016, 14:58:3302.02.2016

til theano-users

FWIW, I've checked in the code I have here:

https://github.com/hhemken/deep_learning

I have attempted to add the ability for the output layer to have multiple columns and to add multiple hidden layers. While it runs, there are a few issues as listed in the README. It also doesn't work with dbn_reg.py, doing only the pre-training.

Any help would be very much appreciated!

Thanks!

Heinz

Meldingen er slettet

Ruben Janssen

ulest,

31. mai 2016, 11:08:4931.05.2016

til theano-users

Hi,

I found your MLP regression quite useful as an intro to theano and neural networks!

I just have one question: after you've trained the network, how do you query it to make predictions? The self.p_y_given_x = T.dot(input, self.W) + self.b seems to be doing that already, although there seems quite be some magic when processing the input (hence I can't get it to work).

I tried something along the lines of:
    run_forward = theano.function(inputs = [x],
                                  outputs = classifier.linRegressionLayer.y_pred,
                                 )

But that seems to always return me the same output

Thanks a lot!.

Best regards,

Ruben

Svar alle

Svar til forfatter

Videresend