Theano RNN Implementation

5,354 views
Skip to first unread message

m_z...@sbox.tugraz.at

unread,
Nov 19, 2011, 4:03:56 AM11/19/11
to theano-users
Hi ;)

As I'm relatively new to Theano I have the question if anybody
currently have
a working RNN implementation available (BPTT or RTRL).

It would be really nice if he can post his code for studying reasons,
or some
tips how to start ;) I will post my code if I manage to do this ;)
(thinking on a Hessian Free (Martens) Implementation as well ;),
but at the moment I'm fighting with some language (theano) related
issues ;)

----------------------------------------------------------------------
# I think this should be the right way to start
----------------------------------------------------------------------
1. Op that takes three arguments (hidden_step,hidden_step_grad,
params)
2. Op implements a transformation from (hidden_init,input) ->
(output,).
3. Op contains the recurrence within itself.
----------------------------------------------------------------------

Thanks in advance!

..
Mat

Arnaud Bergeron

unread,
Nov 19, 2011, 1:24:45 PM11/19/11
to theano...@googlegroups.com
2011/11/19 m_z...@sbox.tugraz.at <m_z...@sbox.tugraz.at>:

I have some code to do recurrence that is available as part of
http://code.google.com/p/pynnet/. It does BPTT only though.

But if you want to write your own thing, you should use scan [
http://deeplearning.net/software/theano/library/scan.html ] rather
than write a new recurrent op.

Razvan Pascanu

unread,
Nov 19, 2011, 3:11:39 PM11/19/11
to theano...@googlegroups.com
Vanilla RNNs are trivial to implement and you do not need to add any new op to Theano. I attached the code implementing it. I think it might make sense even to convert this into a tutorial on RNNs .. which I will probably do soon.  

Doing something different then BPTT might be an interesting extension of the scan op, though I think a revision of scan (which is on my TODO list) might be much much more important (scan has some issues with 2nd order gradients ..). If you would be interested in helping with such a revision let me know. 

Regarding Hessian Free, there are some implementation around. When I first thought about it, I wanted to have the entire thing in Theano, which is not trivial to do (namely to have the linear conjugate gradient in Theano and all that). I haven't work on that in awhile, but there are some technical issues which are not quite solved yet.

If you are doing the linear conjugate in python though, and using Theano only to get the gradients and to get the gauss-newton approximation of the hessian times a vector (i.e. using the Rop) then that should be straight forward. 

A first step would even be to try to use scipy's fmin_ncg (which I think in nature is close to the hessian free minus a few things) on RNNs. 

HTH, 
Razvan

rnn.py

m_z...@sbox.tugraz.at

unread,
Nov 23, 2011, 4:21:22 AM11/23/11
to theano-users
<snipped>

> > I have some code to do recurrence that is available as part of
> >http://code.google.com/p/pynnet/.  It does BPTT only though.
<snipped>

<snipped>


> Vanilla RNNs are trivial to implement and you do not need to add any new op
> to Theano. I attached the code implementing it. I think it might make sense
> even to convert this into a tutorial on RNNs .. which I will probably do
> soon.

<snipped>

Hi! =)

Big thanks for the code you offered! =)
It will help me a lot & saves time ;)

<snipped>


> Doing something different then BPTT might be an interesting extension of
> the scan op, though I think a revision of scan (which is on my TODO list)
> might be much much more important (scan has some issues with 2nd order
> gradients ..). If you would be interested in helping with such a revision
> let me know.

<snipped>

Regarding the Hessian free; I will try to implement a simple python/
theano
version. If this works, and if there is much free time I will drop you
a mail, as
I'm interested in a version running completely in theano as well.

But at the moment I'm have to finish my work on some basic sequence
learners tasks.
Maybe this is a nice thing to have for the tutorial section as
well ;), but you will have
to wait until the code is ready ;) (will drop you a mail, if you're
interested in this)

thx. Mat

Razvan Pascanu

unread,
Apr 17, 2013, 1:09:14 PM4/17/13
to theano...@googlegroups.com
It is possible. Say you have A->B->C->D.
Say you want the reccurency to be between D and B. B has to be a special block as beside the input from A it needs an auxiliary input coming from D. There several ways to do this.. 

My main observation is that you  can construct your lambda function to pass to scan as follows

def rnn(*args):
    outs=theano.clone(D.outs, replace=zip(B.inputs, args)
    return outs, {}


You do however need to make a choice of what is the recurrent input of B before constructing this function. You can define a dummy input or smth. you also need to ensure the order of args is ok somehow. I think this is doable and I had several versions implemented but ther defunct now. 

HTH, Razvan

On Wednesday, April 17, 2013, Sarvi Shanmugham wrote:
I am trying to implement a modular Neural Network object library using Theano.
Implementing the non-recurrent network was simple, assuming each Block object and Connection object knows what its connected to 
and each Block/Connection object offers a compile() function that returns a theano variable caclulated from the compile() of its input objects.

If I were to call compile() on the outblock below, the system walks its way back through 
the compile() of connection3, hiddenblock2, connection2, hiddenblock2, connection1 and finally the input variable in inblock
Works great for non-recurrent models.

<inblock> ----(connection1)-----> <hiddenblock1> ----(connection2)----> <hiddenblock2> ----(connection3)----> <outblock>

I am having trouble figuring out the best way to implement recurrent connections in a generic way.


From your example I sorta, kinda understood how to implement recurrence from one node back to itself.
A simplified version of your code segment
def step(inputsignaldotW,hidden_tm1, recurrentW):
    return inputsignaldotWsequence+T.dot(hidden_tm1, recurrentW)

hidden=scan(sequences=T.dot(inputsignal, W), output_info=[hidden0], recurrentW)

Is there a simple modular local way of doing this kind of recurrent connection when the connection is going back through multiple layers ?

Sarvi
--
 
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sarvi Shanmugham

unread,
Apr 17, 2013, 2:18:00 PM4/17/13
to theano...@googlegroups.com
Let me see if I got this right.
clone helps clone a graph replacing parts of the graph with another graph?

But  I am not sure I understand your example though.

The solution I am hearing is that, for each block we should create the output variable of that block 
using the signal/output variable of all its non-recurrent connection objects and their inputs, as well as dummy variables for  recurrent connections.
And these variables representing recurrent connections are expected to be replaced through the theano.clone() function with graphs created for the recurrent connections through scan() ???
Do I have it right?

Are there tutorial examples on theano.clone()? I am trying to understand how clone identifies which inputs get replaced, how it relates scan() and to implementing recurrence?? 

Can you point me to these recurrent implementations you refer to, for me to look at.

Thx,
Sarvi 
HTH, Razvan
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe@googlegroups.com.

Razvan Pascanu

unread,
Apr 18, 2013, 3:53:35 AM4/18/13
to theano...@googlegroups.com, theano...@googlegroups.com


On 2013-04-17, at 2:18 PM, Sarvi Shanmugham <sarv...@gmail.com> wrote:

Let me see if I got this right.
clone helps clone a graph replacing parts of the graph with another graph?


Yes

But  I am not sure I understand your example though.

The solution I am hearing is that, for each block we should create the output variable of that block 
using the signal/output variable of all its non-recurrent connection objects and their inputs, as well as dummy variables for  recurrent connections.
And these variables representing recurrent connections are expected to be replaced through the theano.clone() function with graphs created for the recurrent connections through scan() ???
Do I have it right?


Yes.

Are there tutorial examples on theano.clone()? I am trying to understand how clone identifies which inputs get replaced, how it relates scan() and to implementing recurrence?? 

Clone gets a replace dictionary where you specify to what to replace (the key) by what (the vlue).


Can you point me to these recurrent implementations you refer to, for me to look at.

They are not online. I'll have to dig for them..

To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.

Ori Tal

unread,
May 28, 2015, 1:51:31 PM5/28/15
to theano...@googlegroups.com
Hi
Sorry for being "Theano Newbies" but can you give some example that show how to use the code? I really don't know how to debug it....

I tried:

import numpy as np
u1 = np.random.rand(10,5).astype('float32')
t1 = np.random.rand(10,5).astype('float32')
fn(np.asarray([0,0,0,0,0]).T.astype('float32'),u1,t1, 0.1)


and got an error
ValueError: dimension mismatch in args to gemv (50,50)x(5)->(50)
Apply node that caused the error: GpuGemv{no_inplace}(GpuGemv{inplace}.0, TensorConstant{1.0}, GpuDimShuffle{1,0}.0, <CudaNdarrayType(float32, vector)>, TensorConstant{1.0})
Inputs types: [CudaNdarrayType(float32, vector), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, vector), TensorType(float32, scalar)]

HINT: Use another linker then the c linker to have the inputs shapes and strides printed.
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Apply node that caused the error: forall_inplace,gpu,scan_fn&scan_fn}(Shape_i{0}.0, GpuSubtensor{int64:int64:int8}.0, GpuIncSubtensor{InplaceSet;:int64:}.0, GpuIncSubtensor{InplaceSet;:int64:}.0, Shape_i{0}.0, Shape_i{0}.0, <CudaNdarrayType(float32, matrix)>, <CudaNdarrayType(float32, matrix)>, <CudaNdarrayType(float32, matrix)>)
Inputs types: [TensorType(int64, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(int64, scalar), TensorType(int64, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(), (10, 5), (1, 5), (11, 5), (), (), (5, 50), (50, 50), (50, 5)]
Inputs strides: [(), (5, 1), (0, 1), (5, 1), (), (), (50, 1), (50, 1), (5, 1)]
Inputs values: [array(10L, dtype=int64), 'not shown', <CudaNdarray object at 0x0000000022DD20F0>, 'not shown', array(10L, dtype=int64), array(10L, dtype=int64), 'not shown', 'not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Daniel Renshaw

unread,
May 29, 2015, 3:24:36 AM5/29/15
to theano...@googlegroups.com
As a newbie I'd recommend working through a Theano tutorial, if you've not already done so, for example [1]. After that, the scan tutorial [2] will help get you up to speed with the Theano basics for implementing RNNs. Then there's a more advanced tutorial for a RNN-RBM [3].



--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

unread,
May 29, 2015, 12:03:27 PM5/29/15
to theano-users
The important part of the error messge is:


ValueError: dimension mismatch in args to gemv (50,50)x(5)->(50)

This mean that a dot between a matrix and a vector do have have inputs with good shapes.

There is this HINT that I suggest you use, normally, it tell you exactly where in your code this dot is comming from:



HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Fred





Sander Stepanov

unread,
Jan 24, 2016, 11:16:49 AM1/24/16
to theano-users
cool thanks, but some tutorial with full example would be great to share

Sander Stepanov

unread,
Feb 26, 2016, 1:33:48 PM2/26/16
to theano-users
I got this massage
C:\Sander\my_code\Feb12_Theano_Tutorial\rnn_Pascanu_theano_google_Feb26.py:51: UserWarning: The parameter 'updates' of theano.function() expects an OrderedDict, got <type 'dict'>. Using a standard dictionary here results in non-deterministic behavior. You should use an OrderedDict if you are using Python 2.7 (theano.compat.OrderedDict for older python), or use a list of (shared, update) pairs. Do not just convert your dictionary to this type before the call as the conversion will still be non-deterministic.
  W_out: W_out - lr * gW_out})


Sander
by the way 
Regarding Hessian Free, there are some implementation around. 
may you help find some code example for  Hessian Free, pls

Frédéric Bastien

unread,
Mar 8, 2016, 8:58:21 AM3/8/16
to theano-users

I think the message is clear, replace a drift by an OrderedDict. Where did you got this code? It use an older interface of Theano. Maybe the code could be updated at the origin to prevent other people from having this problem.

Fred

Sander Stepanov

unread,
Mar 10, 2016, 12:44:39 PM3/10/16
to theano-users
Razvan
you used scan in this example code
but there are some notices that scan is very slow, though it was written in last Theano update that scan improved)
do you have sane example RNN without scan, pls
and it would be great to have example RNN example with advanced tuning like Rprop
Thanks
Sander 


On Saturday, November 19, 2011 at 3:11:39 PM UTC-5, Razvan Pascanu wrote:

Frédéric Bastien

unread,
Mar 10, 2016, 4:41:17 PM3/10/16
to theano-users
Hi,

Scan got much speed up recently. We need to write that notice. Be sure to use Theano 0.8rc1. It is much faster then 0.7 with scan.

There is this advanced LSTM example: http://deeplearning.net/tutorial/lstm.html

Do you have the URL to the example that gave you this error?

Fred

Reply all
Reply to author
Forward
0 new messages