How do I fix this error? "forward_pop_step takes incorrect number of arguments"

130 views
Skip to first unread message

Abhishek Shivkumar

unread,
Oct 5, 2015, 10:29:19 AM10/5/15
to theano-users

    I have the following piece of code that is supposed to implement a 2 hidden layered recurrent neural network. It declares the weights for feeding the input, weights for producing the output and the recurrent weights for the 2 layers in addition to the weights connecting the 2 hidden layers.

When I run the __theano_build__() method, it throws as error saying "TypeError: forward_prop_step() takes exactly 8 arguments (7 given)" Can you please point out to the mistake and help me resolve it?

class RNNTheano:
   
    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate
        # Randomly initialize the network parameters
        U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        W1 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W12 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W2 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        # Theano: Created shared variables
        self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))
        self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))
        self.W1 = theano.shared(name='W1', value=W1.astype(theano.config.floatX))     
        self.W12 = theano.shared(name='W12', value=W12.astype(theano.config.floatX))     
        self.W2 = theano.shared(name='W2', value=W2.astype(theano.config.floatX))
        # We store the Theano graph here
        self.theano = {}
        self.__theano_build__()
   
    def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')
        def forward_prop_step(x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):
            s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
            s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
            o_t = T.nnet.softmax(V.dot(s_t2))
            return [o_t[0], s_t1, s_t2]
        [o,s1,s2], updates = theano.scan(
            forward_prop_step,
            sequences=x,
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)
       
        prediction = T.argmax(o, axis=1)
        o_error = T.sum(T.nnet.categorical_crossentropy(o, y))
       
        # Gradients
        dU = T.grad(o_error, U)
        dV = T.grad(o_error, V)
        dW1 = T.grad(o_error, W1)
        dW12 = T.grad(o_error, W12)
        dW2 = T.grad(o_error, W2)
       
        # Assign functions
        self.forward_propagation = theano.function([x], o)
        self.predict = theano.function([x], prediction)
        self.ce_error = theano.function([x, y], o_error)
        self.bptt = theano.function([x, y], [dU, dV, dW1, dW12, dW2])
       
        # SGD
        learning_rate = T.scalar('learning_rate')
        self.sgd_step = theano.function([x,y,learning_rate], [],
                      updates=[(self.U, self.U - learning_rate * dU),
                              (self.V, self.V - learning_rate * dV),
                              (self.W1, self.W1 - learning_rate * dW1)
                              (self.W12, self.W12 - learning_rate * dW12),
                              (self.W2, self.W2 - learning_rate * dW2)])

Daniel Renshaw

unread,
Oct 5, 2015, 10:33:56 AM10/5/15
to theano...@googlegroups.com
Bit of a guess but maybe the sequences parameter needs to be a list?

So

 [o,s1,s2], updates = theano.scan(
            forward_prop_step,
            sequences=[x],
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)

instead of 

 [o,s1,s2], updates = theano.scan(
            forward_prop_step,
            sequences=x,
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)

Daniel


--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Abhishek Shivkumar

unread,
Oct 5, 2015, 10:38:48 AM10/5/15
to theano-users
Hi Daniel,

   I still get the same error even after making x as a list [x]. Something to do with number of parameters in the outputs_info??

Thanks
Abhishek S

Abhishek Shivkumar

unread,
Oct 5, 2015, 10:41:16 AM10/5/15
to theano-users
FYI, The error description is as follows:


  File "rnn_theano.py", line 28, in __init__
    self.__theano_build__()

  File "rnn_theano.py", line 45, in __theano_build__
    strict=True)

Daniel Renshaw

unread,
Oct 5, 2015, 10:59:53 AM10/5/15
to theano...@googlegroups.com
Could it be something to do with the "self" reference? You're currently defining the step function inside a class member function. Maybe try moving forward_prop_step into the outer-most scope and, if inside the class, marking it as a @staticmethod.

Daniel

Abhishek Shivkumar

unread,
Oct 5, 2015, 11:17:46 AM10/5/15
to theano-users
I tried that as well. I moved it outside as follows but the error is the same. May be something to do with the computational graph structure itself.

def forward_prop_step(self, x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):

        s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
        s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
        o_t = T.nnet.softmax(V.dot(s_t2))
        return [o_t[0], s_t1, s_t2]
       
def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')
       
        [o,s1,s2], updates = theano.scan(
            self.forward_prop_step,

Frédéric Bastien

unread,
Oct 6, 2015, 1:35:47 AM10/6/15
to theano-users
The problem see that part of the scan:


            sequences=x,
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],

Mostly, this tell that the inner scan function will have 1 sequence as inputs. + 3 recurrent state + 5 non sequences. Could you remove the first element of the outputs_info? This will create only 2 recurrents states.

Also I think it should be like this:



            sequences=[x],

but maybe both are accepted.

Fred

Daniel Renshaw

unread,
Oct 6, 2015, 3:23:12 AM10/6/15
to theano...@googlegroups.com
But Fred, the step function intends to have three outputs, two of which should be recurrent, so the outputs_info needs to have three entries with a None for the output that should not be recurrent.

However, I just noticed that the outputs of the step function are all inside a Python list. Does it work if they are returned separately?

So try

return o_t[0], s_t1, s_t2

instead of

return [o_t[0], s_t1, s_t2]

Daniel

Abhishek Shivkumar

unread,
Oct 6, 2015, 4:44:15 AM10/6/15
to theano-users
Ok, thanks for all your help. I finally got it running :)

Here is the code that trains a 2 layered recurrent neural network.

class RNNTheano:
   
    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate
        # Randomly initialize the network parameters
        U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        W1 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W12 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        W2 = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        # Theano: Created shared variables
        self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))
        self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))
        self.W1 = theano.shared(name='W1', value=W1.astype(theano.config.floatX))     
        self.W12 = theano.shared(name='W12', value=W12.astype(theano.config.floatX))     
        self.W2 = theano.shared(name='W2', value=W2.astype(theano.config.floatX))
        # We store the Theano graph here
        self.theano = {}
        self.__theano_build__()
   
    def forward_prop_step(self, x_t, s_t1_prev, s_t2_prev, U, V, W1, W12, W2):
        s_t1 = T.tanh(U[:,x_t] + W1.dot(s_t1_prev))
        s_t2 = T.tanh(W12.dot(s_t1) + W2.dot(s_t2_prev))
        o_t = T.nnet.softmax(V.dot(s_t2))
        return [o_t[0], s_t1, s_t2]
       
    def __theano_build__(self):
        U, V, W1, W12, W2 = self.U, self.V, self.W1, self.W12, self.W2
        x = T.ivector('x')
        y = T.ivector('y')
       
        [o,s1,s2], updates = theano.scan(
            self.forward_prop_step,
            sequences=[x],
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim)), dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W1, W12, W2],
            truncate_gradient=self.bptt_truncate,
            strict=True)
       
        prediction = T.argmax(o, axis=1)
        o_error = T.sum(T.nnet.categorical_crossentropy(o, y))
       
        # Gradients
        dU = T.grad(o_error, U)
        dV = T.grad(o_error, V)
        dW1 = T.grad(o_error, W1)
        dW12 = T.grad(o_error, W12)
        dW2 = T.grad(o_error, W2)
       
        # Assign functions
        self.forward_propagation = theano.function([x], o)
        self.predict = theano.function([x], prediction)
        self.ce_error = theano.function([x, y], o_error)
        self.bptt = theano.function([x, y], [dU, dV, dW1, dW12, dW2])
       
        # SGD
        learning_rate = T.scalar('learning_rate')
        self.sgd_step = theano.function([x,y,learning_rate], [],
                      updates=[(self.U, self.U - learning_rate * dU),
                              (self.V, self.V - learning_rate * dV),
                              (self.W1, self.W1 - learning_rate * dW1),

                              (self.W12, self.W12 - learning_rate * dW12),
                              (self.W2, self.W2 - learning_rate * dW2)])

Wang Zhiyang

unread,
Oct 28, 2015, 4:15:02 AM10/28/15
to theano-users
Hi, Abhishek! I found your post from the RNN Tutorial Part 2 and came here! I met the similar error message like you:

TypeError: forward_prop_step() takes exactly 5 arguments (6 given)

I also made some alternations to the original code. What I want to realize is a one-hidden-layer RNN for regression instead of classification. I changed the loss function from crossentropy error to MSE. I changed the activation function of the output layer from softmax to linear. However it does not work and reports the above TypeError.  I don't think these changes will lead to the error. Could anyone help me check my understandings? Thank you!

My data set is as follows:

Each input data is a 1D time series data with 1201 time steps. 

The output data is also a 1D time-series data whose length is 1201.  

I built the X_train and y_train are both lists. So X_train[i] and y_train[i] are also lists with length of 1201.

My full code is as follows. I highlighted the modified code in red color for clearance.

class RNNRegressionTheano:

    def __init__(self, input_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.bptt_truncate = bptt_truncate
        # Randomly initialize the network parameters
        U = np.random.uniform(-np.sqrt(1./input_dim), np.sqrt(1./input_dim), (hidden_dim, input_dim))
        V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (input_dim, hidden_dim))
        W = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))
        # Theano: Created shared variables
        self.U = theano.shared(name='U', value=U.astype(theano.config.floatX))
        self.V = theano.shared(name='V', value=V.astype(theano.config.floatX))
        self.W = theano.shared(name='W', value=W.astype(theano.config.floatX)) 
        # We store the Theano graph here
        self.theano = {}
        self.__theano_build__()
    
    def forward_prop_step(x_t, s_t_prev, U, V, W):
        s_t = T.tanh(U.dot(x_t) + W.dot(s_t_prev))
        #s_t = U.dot(x_t) + W.dot(s_t_prev)
        o_t = V.dot(s_t)
        #o_t = T.nnet.softmax(V.dot(s_t))
        return [o_t[0], s_t]

    
    def __theano_build__(self):
        U, V, W = self.U, self.V, self.W
        x = T.fvector('x')
        y = T.fvector('y')
        
        [o,s], updates = theano.scan(
            self.forward_prop_step,
            sequences=[x],
            outputs_info=[None, dict(initial=T.zeros(self.hidden_dim))],
            non_sequences=[U, V, W],
            truncate_gradient=self.bptt_truncate,
            strict=True)

        prediction = o
        o_error = 0.5 * ((o - y)**2).mean().sum()

        # Gradients
        dU = T.grad(o_error, U)
        dV = T.grad(o_error, V)
        dW = T.grad(o_error, W)

        # Assign functions
        self.forward_propagation = theano.function([x], o)
        self.predict = theano.function([x], prediction)
        self.mse_error = theano.function([x, y], o_error)
        self.bptt = theano.function([x, y], [dU, dV, dW])

        # SGD
        learning_rate = T.scalar('learning_rate')
        self.sgd_step = theano.function([x,y,learning_rate], [], 
                      updates=[(self.U, self.U - learning_rate * dU),
                              (self.V, self.V - learning_rate * dV),
                              (self.W, self.W - learning_rate * dW)])

    def calculate_total_loss(self, X, Y):
        return np.sum([self.mse_error(x,y) for x,y in zip(X,Y)])

    def calculate_loss(self, X, Y):
        # Divide calculate_loss by the number of words
        num_words = np.sum([len(y) for y in Y])
        return self.calculate_total_loss(X,Y)/float(num_words)

 

在 2015年10月5日星期一 UTC+8下午10:29:19,Abhishek Shivkumar写道:

Abhishek Shivkumar

unread,
Oct 28, 2015, 6:43:28 AM10/28/15
to theano-users
Hi Wang

  I think you need to add "self" to this method in your code


def forward_prop_step(x_t, s_t_prev, U, V, W):

should be

def forward_prop_step(self, x_t, s_t_prev, U, V, W):

... and it should run fine. Please let me know.

Wang Zhiyang

unread,
Oct 29, 2015, 9:24:09 PM10/29/15
to theano-users
Thank you for your answering. I tried your advice but it didn't work. If put the forward_prop_step function into the theano_build function, just like Denny suggests, the error is

When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has 2 dimension(s), while the result of the inner function (`fn`) has 2 dimension(s) (should be one less than the initial state).

If I add a "self" in forward_prop_step as the first argument the error goes to

TypeError: forward_prop_step() takes exactly 6 arguments (5 given)

If separate the two functions and then add a "self" in forward_prop_step as the first argument, the error goes back to 

When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has 2 dimension(s), while the result of the inner function (`fn`) has 2 dimension(s) (should be one less than the initial state). 



在 2015年10月28日星期三 UTC+8下午6:43:28,Abhishek Shivkumar写道:

Daniel Renshaw

unread,
Nov 2, 2015, 2:25:20 AM11/2/15
to theano...@googlegroups.com
It's most likely that the "takes exactly X arguments (Y given)" is an earlier error than the "initial state ... has X dimension(s), while the result of the inner function ... has Y dimension(s)" error. Whatever you did to stop the earlier error appearing was a good change, keep it. You now need to solve the later error which is most likely unrelated.

That later error message (about scan initial value dimensions) has always confused me because I don't think it has ever been accurate. I think I always see that error when I make some error in the scan arrangement but it was not actually to do with initial value dimensionality. I can't say if that's true for you, but maybe look a little more generally at how you're doing your scan setup; simplify things until it works then gradually add the complexity back in. If you can demonstrate the problem in a simple bit of executable code then you could share it here and we may be able to help.

Daniel





--
Reply all
Reply to author
Forward
0 new messages