How do I fix this error? "forward_pop_step takes incorrect number of arguments"

Abhishek Shivkumar

unread,

Oct 5, 2015, 10:29:19 AM10/5/15

to theano-users

    I have the following piece of code that is supposed to implement a 2 hidden layered recurrent neural network. It declares the weights for feeding the input, weights for producing the output and the recurrent weights for the 2 layers in addition to the weights connecting the 2 hidden layers.

When I run the __theano_build__() method, it throws as error saying "TypeError: forward_prop_step() takes exactly 8 arguments (7 given)" Can you please point out to the mistake and help me resolve it?

class RNNTheano:

    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate
        # Randomly initialize the network parameters
        U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        W1 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W12 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W2 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        # Theano: Created shared variables
        self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))
        self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))
        self.W1 = theano.shared(name='W1', value=W1.astype(theano.config.floatX))
        self.W12 = theano.shared(name='W12', value=W12.astype(theano.config.floatX))
        self.W2 = theano.shared(name='W2', value=W2.astype(theano.config.floatX))
        # We store the Theano graph here
        self.theano = {}
        self.__theano_build__()

    def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')
        def forward_prop_step(x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):
            s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
            s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
            o_t = T.nnet.softmax(V.dot(s_t2))
            return [o_t[0], s_t1, s_t2]
        [o,s1,s2], updates = theano.scan(
            forward_prop_step,
            sequences=x,
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)

        prediction = T.argmax(o, axis=1)
        o_error = T.sum(T.nnet.categorical_crossentropy(o, y))

        # Gradients
        dU = T.grad(o_error, U)
        dV = T.grad(o_error, V)
        dW1 = T.grad(o_error, W1)
        dW12 = T.grad(o_error, W12)
        dW2 = T.grad(o_error, W2)

        # Assign functions
        self.forward_propagation = theano.function([x], o)
        self.predict = theano.function([x], prediction)
        self.ce_error = theano.function([x, y], o_error)
        self.bptt = theano.function([x, y], [dU, dV, dW1, dW12, dW2])

        # SGD
        learning_rate = T.scalar('learning_rate')
        self.sgd_step = theano.function([x,y,learning_rate], [],
                      updates=[(self.U, self.U - learning_rate * dU),
                              (self.V, self.V - learning_rate * dV),
                              (self.W1, self.W1 - learning_rate * dW1)
                              (self.W12, self.W12 - learning_rate * dW12),
                              (self.W2, self.W2 - learning_rate * dW2)])

Daniel Renshaw

unread,

Oct 5, 2015, 10:33:56 AM10/5/15

to theano...@googlegroups.com

Bit of a guess but maybe the sequences parameter needs to be a list?

So

[o,s1,s2], updates = theano.scan(

forward_prop_step,

sequences=[x],

outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],

non_sequences=[U, V, W1, W12, W2],

truncate_gradient=self.bptt_truncate,

strict=True)

instead of

[o,s1,s2], updates = theano.scan(

forward_prop_step,

sequences=x,

outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],

non_sequences=[U, V, W1, W12, W2],

truncate_gradient=self.bptt_truncate,

strict=True)

Daniel

--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Abhishek Shivkumar

unread,

Oct 5, 2015, 10:38:48 AM10/5/15

to theano-users

Hi Daniel,

I still get the same error even after making x as a list [x]. Something to do with number of parameters in the outputs_info??

Thanks
Abhishek S

Abhishek Shivkumar

unread,

Oct 5, 2015, 10:41:16 AM10/5/15

to theano-users

FYI, The error description is as follows:

File "rnn_theano.py", line 28, in __init__
self.__theano_build__()

File "rnn_theano.py", line 45, in __theano_build__
strict=True)

Daniel Renshaw

unread,

Oct 5, 2015, 10:59:53 AM10/5/15

to theano...@googlegroups.com

Could it be something to do with the "self" reference? You're currently defining the step function inside a class member function. Maybe try moving forward_prop_step into the outer-most scope and, if inside the class, marking it as a @staticmethod.

Daniel

Abhishek Shivkumar

unread,

Oct 5, 2015, 11:17:46 AM10/5/15

to theano-users

I tried that as well. I moved it outside as follows but the error is the same. May be something to do with the computational graph structure itself.

def forward_prop_step(self, x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):

        s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
        s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
        o_t = T.nnet.softmax(V.dot(s_t2))
        return [o_t[0], s_t1, s_t2]

def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')

[o,s1,s2], updates = theano.scan(

self.forward_prop_step,

Frédéric Bastien

unread,

Oct 6, 2015, 1:35:47 AM10/6/15

to theano-users

The problem see that part of the scan:

            sequences=x,
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],

Mostly, this tell that the inner scan function will have 1 sequence as inputs. + 3 recurrent state + 5 non sequences. Could you remove the first element of the outputs_info? This will create only 2 recurrents states.

Also I think it should be like this:

sequences=[x],

but maybe both are accepted.

Fred

Daniel Renshaw

unread,

Oct 6, 2015, 3:23:12 AM10/6/15

to theano...@googlegroups.com

But Fred, the step function intends to have three outputs, two of which should be recurrent, so the outputs_info needs to have three entries with a None for the output that should not be recurrent.

However, I just noticed that the outputs of the step function are all inside a Python list. Does it work if they are returned separately?

So try

return o_t[0], s_t1, s_t2

instead of

return [o_t[0], s_t1, s_t2]

Daniel

Abhishek Shivkumar

unread,

Oct 6, 2015, 4:44:15 AM10/6/15

to theano-users

Ok, thanks for all your help. I finally got it running :)

Here is the code that trains a 2 layered recurrent neural network.

class RNNTheano:

    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate
        # Randomly initialize the network parameters
        U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        W1 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W12 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W2 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        # Theano: Created shared variables
        self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))
        self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))
        self.W1 = theano.shared(name='W1', value=W1.astype(theano.config.floatX))
        self.W12 = theano.shared(name='W12', value=W12.astype(theano.config.floatX))
        self.W2 = theano.shared(name='W2', value=W2.astype(theano.config.floatX))
        # We store the Theano graph here
        self.theano = {}
        self.__theano_build__()

    def forward_prop_step(self, x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):
        s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
        s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
        o_t = T.nnet.softmax(V.dot(s_t2))
        return [o_t[0], s_t1, s_t2]

    def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')

        [o,s1,s2], updates = theano.scan(
            self.forward_prop_step,

            sequences=[x],
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)

        prediction = T.argmax(o, axis=1)
        o_error = T.sum(T.nnet.categorical_crossentropy(o, y))

        # Gradients
        dU = T.grad(o_error, U)
        dV = T.grad(o_error, V)
        dW1 = T.grad(o_error, W1)
        dW12 = T.grad(o_error, W12)
        dW2 = T.grad(o_error, W2)

        # Assign functions
        self.forward_propagation = theano.function([x], o)
        self.predict = theano.function([x], prediction)
        self.ce_error = theano.function([x, y], o_error)
        self.bptt = theano.function([x, y], [dU, dV, dW1, dW12, dW2])

        # SGD
        learning_rate = T.scalar('learning_rate')
        self.sgd_step = theano.function([x,y,learning_rate], [],
                      updates=[(self.U, self.U - learning_rate * dU),
                              (self.V, self.V - learning_rate * dV),

(self.W1, self.W1 - learning_rate * dW1),

(self.W12, self.W12 - learning_rate * dW12),
(self.W2, self.W2 - learning_rate * dW2)])

Wang Zhiyang

unread,

Oct 28, 2015, 4:15:02 AM10/28/15

to theano-users

Hi, Abhishek! I found your post from the RNN Tutorial Part 2 and came here! I met the similar error message like you:

TypeError: forward_prop_step() takes exactly 5 arguments (6 given)

I also made some alternations to the original code. What I want to realize is a one-hidden-layer RNN for regression instead of classification. I changed the loss function from crossentropy error to MSE. I changed the activation function of the output layer from softmax to linear. However it does not work and reports the above TypeError. I don't think these changes will lead to the error. Could anyone help me check my understandings? Thank you!

My data set is as follows:

Each input data is a 1D time series data with 1201 time steps.

The output data is also a 1D time-series data whose length is 1201.

I built the X_train and y_train are both lists. So X_train[i] and y_train[i] are also lists with length of 1201.

My full code is as follows. I highlighted the modified code in red color for clearance.

class RNNRegressionTheano:

def __init__(self, input_dim, hidden_dim=100, bptt_truncate=4):

# Assign instance variables

self.input_dim = input_dim

self.hidden_dim = hidden_dim

self.bptt_truncate = bptt_truncate

# Randomly initialize the network parameters

U = np.random.uniform(-np.sqrt(1./input_dim), np.sqrt(1./input_dim), (hidden_dim, input_dim))

V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (input_dim, hidden_dim))

W = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))

# Theano: Created shared variables

self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))

self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))

self.W = theano.shared(name='W', value=W.astype(theano.config.floatX))

# We store the Theano graph here

self.theano = {}

self.__theano_build__()

def forward_prop_step(x_t, s_t_prev, U, V, W):

s_t = T.tanh(U.dot(x_t) + W.dot(s_t_prev))

#s_t = U.dot(x_t) + W.dot(s_t_prev)

o_t = V.dot(s_t)

#o_t = T.nnet.softmax(V.dot(s_t))

return [o_t[0], s_t]

def __theano_build__(self):

U, V, W = self.U, self.V, self.W

x = T.fvector('x')

y = T.fvector('y')

[o,s], updates = theano.scan(

self.forward_prop_step,

sequences=[x],

outputs_info=[None, dict(initial=T.zeros(self.hidden_dim))],

non_sequences=[U, V, W],

truncate_gradient=self.bptt_truncate,

strict=True)

prediction = o

o_error = 0.5 * ((o - y)**2).mean().sum()

# Gradients

dU = T.grad(o_error, U)

dV = T.grad(o_error, V)

dW = T.grad(o_error, W)

# Assign functions

self.forward_propagation = theano.function([x], o)

self.predict = theano.function([x], prediction)

self.mse_error = theano.function([x, y], o_error)

self.bptt = theano.function([x, y], [dU, dV, dW])

# SGD

learning_rate = T.scalar('learning_rate')

self.sgd_step = theano.function([x,y,learning_rate], [],

updates=[(self.U, self.U - learning_rate * dU),

(self.V, self.V - learning_rate * dV),

(self.W, self.W - learning_rate * dW)])

def calculate_total_loss(self, X, Y):

return np.sum([self.mse_error(x,y) for x,y in zip(X,Y)])

def calculate_loss(self, X, Y):

# Divide calculate_loss by the number of words

num_words = np.sum([len(y) for y in Y])

return self.calculate_total_loss(X,Y)/float(num_words)

在 2015年10月5日星期一 UTC+8下午10:29:19，Abhishek Shivkumar写道：

Abhishek Shivkumar

unread,

Oct 28, 2015, 6:43:28 AM10/28/15

to theano-users

Hi Wang

I think you need to add "self" to this method in your code

def forward_prop_step(x_t, s_t_prev, U, V, W):

should be

def forward_prop_step(self, x_t, s_t_prev, U, V, W):

... and it should run fine. Please let me know.

Wang Zhiyang

unread,

Oct 29, 2015, 9:24:09 PM10/29/15

to theano-users

Thank you for your answering. I tried your advice but it didn't work. If put the forward_prop_step function into the theano_build function, just like Denny suggests, the error is

When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has 2 dimension(s), while the result of the inner function (`fn`) has 2 dimension(s) (should be one less than the initial state).

If I add a "self" in forward_prop_step as the first argument the error goes to

TypeError: forward_prop_step() takes exactly 6 arguments (5 given)

If separate the two functions and then add a "self" in forward_prop_step as the first argument, the error goes back to

When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has 2 dimension(s), while the result of the inner function (`fn`) has 2 dimension(s) (should be one less than the initial state).

在 2015年10月28日星期三 UTC+8下午6:43:28，Abhishek Shivkumar写道：

Daniel Renshaw

unread,

Nov 2, 2015, 2:25:20 AM11/2/15

to theano...@googlegroups.com

It's most likely that the "takes exactly X arguments (Y given)" is an earlier error than the "initial state ... has X dimension(s), while the result of the inner function ... has Y dimension(s)" error. Whatever you did to stop the earlier error appearing was a good change, keep it. You now need to solve the later error which is most likely unrelated.

That later error message (about scan initial value dimensions) has always confused me because I don't think it has ever been accurate. I think I always see that error when I make some error in the scan arrangement but it was not actually to do with initial value dimensionality. I can't say if that's true for you, but maybe look a little more generally at how you're doing your scan setup; simplify things until it works then gradually add the complexity back in. If you can demonstrate the problem in a simple bit of executable code then you could share it here and we may be able to help.