Implementing recurrent encoder-decoder

302 views
Skip to first unread message

Giulio Petrucci

unread,
Oct 12, 2015, 5:13:52 AM10/12/15
to lasagn...@googlegroups.com
Hello there,

first post here: nice to meet you.
I am using Theano and Lasagne to implement Recurrent Encoder-Decoder
as in this paper:

http://arxiv.org/abs/1406.1078

I am implementing Lasagne layers and I was wondering if the resulting
code could be a good contribution for the community. If the Lasagne
authors are reading: what do you think about? I would be glad to
contribute to the project.

Thanks,
Giulio

--

goo...@jan-schlueter.de

unread,
Oct 12, 2015, 7:26:40 AM10/12/15
to lasagne-users
Hello Giulio!


I am implementing Lasagne layers and I was wondering if the resulting
code could be a good contribution for the community. If the Lasagne
authors are reading: what do you think about? I would be glad to
contribute to the project.

Sure! If you reimplemented the paper (or something like it with a working demo), the best idea would be to contribute your code to Lasagne/Recipes: https://github.com/Lasagne/Recipes
We can then see if and how to move some of your layer implementations directly to Lasagne. The recurrent layers are still a bit rough on the edges, and we're pondering a redesign: https://github.com/Lasagne/Lasagne/issues/425
In any case, I'd be interested in how you implemented the decoder, and adding your code to Lasagne/Recipes will help the discussion about lasagne.layers.recurrent.

Let us know if you need any help with sending a pull request!
Best, Jan

Giulio Petrucci

unread,
Oct 12, 2015, 9:04:52 AM10/12/15
to lasagn...@googlegroups.com
Hi Jan,

thanks for your reply.

On Mon, Oct 12, 2015 at 1:26 PM, <goo...@jan-schlueter.de> wrote:
> Sure! If you reimplemented the paper (or something like it with a working
> demo), the best idea would be to contribute your code to Lasagne/Recipes:
> https://github.com/Lasagne/Recipes

Very good! I should finish everything for the beginning of the next week.
I am also setting up a super-simple (artificial) problem to help
getting into the problem... help myself, I mean. :-)
I could also write a short tutorial about it.

> We can then see if and how to move some of your layer implementations
> directly to Lasagne. The recurrent layers are still a bit rough on the
> edges, and we're pondering a redesign:
> https://github.com/Lasagne/Lasagne/issues/425

I see. Anyway, no big deal. I can push my code and then help you in
refactoring it according to the new design goals.

> In any case, I'd be interested in how you implemented the decoder, and
> adding your code to Lasagne/Recipes will help the discussion about
> lasagne.layers.recurrent.
> Let us know if you need any help with sending a pull request!

Got it. I am quite familiar with the GitHub/BitBucket workflow but I
am pretty sure that I could need some kind of "mentorship" to check if
my code is 100% compliant with the standards, goals, conventions and
so on. But I think that peer-reviewing code should not be that
painful. Should I fork the repository and start pushing stuff on my
copy?

Best,
Giulio

--

goo...@jan-schlueter.de

unread,
Oct 12, 2015, 11:22:04 AM10/12/15
to lasagne-users
Hey,


Very good! I should finish everything for the beginning of the next week.
I am also setting up a super-simple (artificial) problem to help
getting into the problem... help myself, I mean. :-)
I could also write a short tutorial about it.

Sounds good! The two types of things we've got in Recipes so far are iPython notebooks and self-contained Python scripts, so it would be good if you could bring it into either of these forms (a self-contained Python script is a lot easier to create, maintain, and reuse, and a notebook is better for walking readers through an implementation along with some examples). Creating a subdirectory with a runnable Python script and an extra imported file with your layers would be fine as well.

I see. Anyway, no big deal. I can push my code and then help you in
refactoring it according to the new design goals.

Sounds good as well. We could definitely use some more RNN users for the discussion. We don't plan to throw away what we have, we're just thinking of a more flexible layout for the RNN layers to cater for additional use cases (the encoder/decoder being one of them).

Got it. I am quite familiar with the GitHub/BitBucket workflow but I
am pretty sure that I could need some kind of "mentorship" to check if
my code is 100% compliant with the standards, goals, conventions and
so on.

For Lasagne/Recipes, there are no strict standards, you should just follow PEP8. For Lasagne/Lasagne, we can guide you, but it will be much easier after seeing your code!
 
Should I fork the repository and start pushing stuff on my copy?

Sure, just go ahead, fork Lasagne/Recipes, clone it to your machine, create a branch, commit to it (probably in examples/, papers/ is meant for reproductions of published experimental results) and send a Pull Request! We can then easily mentor you via github's pull request interface.

Cheers, Jan

Giulio Petrucci

unread,
Oct 12, 2015, 11:41:34 AM10/12/15
to lasagn...@googlegroups.com
Hi Jan,

thanks for your reply.

On Mon, Oct 12, 2015 at 5:22 PM, <goo...@jan-schlueter.de> wrote:
> Sounds good! The two types of things we've got in Recipes so far are iPython
> notebooks and self-contained Python scripts, so it would be good if you
> could bring it into either of these forms (a self-contained Python script is
> a lot easier to create, maintain, and reuse, and a notebook is better for
> walking readers through an implementation along with some examples).
> Creating a subdirectory with a runnable Python script and an extra imported
> file with your layers would be fine as well.

Script + subdirectory.
Never used IPython notebook aut similia.
(maybe later)

Let's keep in touch!

Ciao,
Giulio

--

Søren Sønderby

unread,
Oct 12, 2015, 1:07:10 PM10/12/15
to lasagn...@googlegroups.com
Hi Giulio,

It is something we have discussed a while :) eg. here https://github.com/Lasagne/Lasagne/issues/391and the related https://github.com/Lasagne/Lasagne/issues/425

I think the model you link is a vanilla encoder decoder? If you are willing to use a GRU layer you should be able to implement that with nearly no changes in lasagne

We did that for a summerschool at DTU, see http://dtu-deeplearning.github.io/

We implemented a vanilla encoder-decoder in https://github.com/DTU-deeplearning/day3-RNN/blob/master/RNN.ipynb

At the bottom you’ll also find a hacky implementation of encoder/decoder with softmax attention.

best regards Søren
> --
> You received this message because you are subscribed to the Google Groups "lasagne-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lasagne-user...@googlegroups.com.
> To post to this group, send email to lasagn...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/CAGWVMXrbXdwQHqGMB7%2BbtWVjvQFp%2B-kCUwcJcLwk_KohE01Vpw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Giulio Petrucci

unread,
Oct 12, 2015, 5:49:20 PM10/12/15
to lasagn...@googlegroups.com
Hi Søren,

Thanks for the reply! I am taking a look to the link that you sent
me... but it's 11:30 PM so I will take a deeper look tomorrow morning!
:-)
Just some clarifications:

On Mon, Oct 12, 2015 at 3:11 PM, Søren Sønderby <skaaes...@gmail.com> wrote:
> I think the model you link is a vanilla encoder decoder? If you are willing to use a GRU layer you should be able to implement that with nearly no changes in lasagne

What do you mean with "vanilla"?
The model that I want to implement is the one described in the paper I
linked in mail first mail. You can have a fast idea looking at Figure
1 in that paper.
The issue is that at each decoding step y_t depends on y_tm1 and h_t
while h_t depends on y_tm1 and h_tm1.

> We did that for a summerschool at DTU, see http://dtu-deeplearning.github.io/

I am taking a look to:

https://github.com/DTU-deeplearning/day3-RNN/blob/master/RNN.ipynb

but I am not getting it, but maybe just because it's late.
Anyway, tomorrow morning I will try to reproduce everythink. If I have
any problem, I will post here.

Thanks!

Ciao,
Giulio

--

Søren Sønderby

unread,
Oct 13, 2015, 1:40:45 AM10/13/15
to lasagn...@googlegroups.com

On 12 Oct 2015, at 23:48, Giulio Petrucci <giulio....@gmail.com> wrote:

> Hi Søren,
>
> Thanks for the reply! I am taking a look to the link that you sent
> me... but it's 11:30 PM so I will take a deeper look tomorrow morning!
> :-)
> Just some clarifications:
>
> On Mon, Oct 12, 2015 at 3:11 PM, Søren Sønderby <skaaes...@gmail.com> wrote:
>> I think the model you link is a vanilla encoder decoder? If you are willing to use a GRU layer you should be able to implement that with nearly no changes in lasagne
>
> What do you mean with "vanilla”?

That its without attention.

> The model that I want to implement is the one described in the paper I
> linked in mail first mail. You can have a fast idea looking at Figure
> 1 in that paper.
> The issue is that at each decoding step y_t depends on y_tm1 and h_t
> while h_t depends on y_tm1 and h_tm1.

Currently having dependencies between y and the next timestep is not possible in lasangne. I think the problem is explained in the issues i linked.
>
>> We did that for a summerschool at DTU, see http://dtu-deeplearning.github.io/
>
> I am taking a look to:
>
> https://github.com/DTU-deeplearning/day3-RNN/blob/master/RNN.ipynb
>
> but I am not getting it, but maybe just because it's late.
> Anyway, tomorrow morning I will try to reproduce everythink. If I have
> any problem, I will post here.
>
> Thanks!
>
> Ciao,
> Giulio
>
> --
>
> --
> You received this message because you are subscribed to the Google Groups "lasagne-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lasagne-user...@googlegroups.com.
> To post to this group, send email to lasagn...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/CAGWVMXp34xPYj0j%3D%3DhthHkkJ-npetR2A4watFMOHmPdA9a9mtg%40mail.gmail.com.

Giulio Petrucci

unread,
Oct 13, 2015, 3:30:47 AM10/13/15
to lasagn...@googlegroups.com
Hi Søren,

thanks for your reply

On Tue, Oct 13, 2015 at 7:40 AM, Søren Sønderby <skaaes...@gmail.com> wrote:
> Currently having dependencies between y and the next timestep is not possible in lasangne. I think the problem is explained in the issues i linked.

I see. That's exactly what I am trying to achive!

I will write back you later.
Meanwhile, thanks.

Ciao,
Giulio

--

Giulio Petrucci

unread,
Oct 13, 2015, 10:43:35 AM10/13/15
to lasagn...@googlegroups.com
Hi Søren,

sorry for the delay.
Finally I could re-implement the day3 example.
I will try to fit my problem with this model... finger crossed, let's
se what's happen!
Anyway, I think that the implementation of a recurrent layer with the
output feeding back as an input could be useful... so @Jan, I'm not
giving up for now. But maybe it will take some more time.

Thanks,
Giulio

--

Giulio Petrucci

unread,
Oct 13, 2015, 11:43:52 AM10/13/15
to lasagn...@googlegroups.com
Hi again,

On Mon, Oct 12, 2015 at 3:11 PM, Søren Sønderby <skaaes...@gmail.com> wrote:
[cut]
> We implemented a vanilla encoder-decoder in https://github.com/DTU-deeplearning/day3-RNN/blob/master/RNN.ipynb
[cut]

In my last reply I forgot to add this.
Setting "theano.config.warn_float64='raise'", the compilation phase
complains that the following line is trying to create a Float64
tensor:

output_decoder_train = lasagne.layers.get_output(l_out, inputs={l_in:
x_sym, l_mask_enc: xmask_sym}, deterministic=False)

I am kind of a newbie with theano/lasagne and I must admit that I
don't get everything about this typing stuff (and abput the rationale
behind... like that the optimization is top for float32 and the
default type is float64, even if theano.config.floatX is set to
float32...), but can anyone help me understand what's going on?

Thanks,
Giulio

--

goo...@jan-schlueter.de

unread,
Oct 14, 2015, 6:59:58 AM10/14/15
to lasagne-users
In my last reply I forgot to add this.
Setting "theano.config.warn_float64='raise'", the compilation phase
complains that the following line is trying to create a Float64
tensor:

output_decoder_train = lasagne.layers.get_output(l_out, inputs={l_in:
x_sym, l_mask_enc: xmask_sym},  deterministic=False)

I am kind of a newbie with theano/lasagne and I must admit that I
don't get everything about this typing stuff (and abput the rationale
behind... like that the optimization is top for float32 and the
default type is float64, even if theano.config.floatX is set to
float32...), but can anyone help me understand what's going on?

The problem about float64 is that Theano doesn't support double precision on GPU (I guess because the very first GPUs (compute capability < 1.3) only supported single precision, and because double precision is not really useful for neural networks, the main application area of Theano). So it's important to keep precision in the computation graph down to float32 to allow Theano to move everything to the GPU when compiling it into a function.
If floatX is set to "float32", then the default type for T.matrix(), T.tensor4(), T.vector() etc. is float32, and all the shared variables created by Lasagne are float32 (because we take care of the theano.config.floatX setting). However, it may happen that for some operation, only one of the operands is float32, and the result is float64 -- this is what the warning mode will catch. Usually, the second operand in such a case is an int64 or a float64 symbolic variable or numpy array.

I haven't used the warning mode and am surprised that it catches the "lasagne.layers.get_output" line and not something inside that, i.e., the actual operation resulting in a float64 output. In any case, you should try to figure out where the upcast to float64 happens, otherwise at least part of your computation will happen on CPU instead of GPU.

Best, Jan

Giulio Petrucci

unread,
Oct 14, 2015, 9:46:17 AM10/14/15
to lasagn...@googlegroups.com
Hi Jan,

thanks for your reply.

On Wed, Oct 14, 2015 at 12:59 PM, <goo...@jan-schlueter.de> wrote:
> The problem about float64 is that Theano doesn't support double precision on
> GPU (I guess because the very first GPUs (compute capability < 1.3) only
> supported single precision, and because double precision is not really
> useful for neural networks, the main application area of Theano). So it's
> important to keep precision in the computation graph down to float32 to
> allow Theano to move everything to the GPU when compiling it into a
> function.
> If floatX is set to "float32", then the default type for T.matrix(),
> T.tensor4(), T.vector() etc. is float32, and all the shared variables
> created by Lasagne are float32 (because we take care of the
> theano.config.floatX setting). However, it may happen that for some
> operation, only one of the operands is float32, and the result is float64 --
> this is what the warning mode will catch. Usually, the second operand in
> such a case is an int64 or a float64 symbolic variable or numpy array.

So, to ensure that everything is float32, shall I use a
theano.tensor.cast() in such scenario?

> I haven't used the warning mode and am surprised that it catches the
> "lasagne.layers.get_output" line and not something inside that, i.e., the
> actual operation resulting in a float64 output. In any case, you should try
> to figure out where the upcast to float64 happens, otherwise at least part
> of your computation will happen on CPU instead of GPU.

So, you say that I should "debug" Lasagne?

Thanks,
Giulio

--

goo...@jan-schlueter.de

unread,
Oct 14, 2015, 10:35:08 AM10/14/15
to lasagne-users
Hey,


So, to ensure that everything is float32, shall I use a
theano.tensor.cast() in such scenario?

Yes, but as early in the computation as possible. And often it's caused by a numpy array being in a wrong dtype, so in that case you should convert the numpy array already. Note that Theano expressions have `.astype()` as a shortcut for `T.cast(...)`.
 
So, you say that I should "debug" Lasagne?

I'm 95% sure the problem is somewhere in your code, not in Lasagne's, but it may be tricky to find. Doesn't the warn_float64 give you more detailed information about where the float64 occurs for the first time? It may help to print the dtypes of some variables and intermediate expressions (x_sym.dtype, xmask_sym.dtype, output_decoder_train.dtype etc.).

Best, Jan

Giulio Petrucci

unread,
Oct 15, 2015, 3:46:39 AM10/15/15
to lasagn...@googlegroups.com
Hi Jan,

Thanks for the quick reply.

On Wed, Oct 14, 2015 at 4:35 PM, <goo...@jan-schlueter.de> wrote:
> Yes, but as early in the computation as possible. And often it's caused by a
> numpy array being in a wrong dtype, so in that case you should convert the
> numpy array already. Note that Theano expressions have `.astype()` as a
> shortcut for `T.cast(...)`.

I see. Thanks for remarking.

> I'm 95% sure the problem is somewhere in your code, not in Lasagne's, but it
> may be tricky to find. Doesn't the warn_float64 give you more detailed
> information about where the float64 occurs for the first time? It may help
> to print the dtypes of some variables and intermediate expressions
> (x_sym.dtype, xmask_sym.dtype, output_decoder_train.dtype etc.).

I just copied&pasted this example:

https://github.com/DTU-deeplearning/day3-RNN/blob/master/RNN.ipynb

just changing some name and bringing everything to be PEP8 compliant.
Maybe I changed something without realizing.
Anyway, please, find in attachment the .py file without any print
statement, ecc. I set the proper theano.config flags so if you just
run "python day3.py" you should see the same error.

Thanks in advance.

Have a nice day,
Giulio

--
day3.py

goo...@jan-schlueter.de

unread,
Oct 15, 2015, 7:05:07 AM10/15/15
to lasagne-users
Hey,

I don't know what you did, but if I run:
    THEANO_FLAGS=warn_float64='raise' python day3.py

I get:
Traceback (most recent call last):
  File "day3.py", line 82, in <module>
    acc = T.mean(eq)

So obviously it's the T.mean() line, and the result is upcasted to float64 because `eq` is boolean. The solution is to replace it with:
    acc = T.mean(eq, dtype=theano.config.floatX)

Best, Jan

Giulio Petrucci

unread,
Oct 19, 2015, 3:58:21 AM10/19/15
to lasagn...@googlegroups.com
Hi Jan,
Thanks for pointing out. I added the dtype specification in T.mean()
but still I am having a different error.
If I run:

:~$ THEANO_FLAGS=warn_float64='raise' python day3.py

on the attached file, I have;

Traceback (most recent call last):
File "day3.py", line 59, in <module>
output_decoder_train = lasagne.layers.get_output(l_out, inputs={
l_in: x_sym, l_mask_enc: xmask_sym}, deterministic=False)

Please, could you double check? I am getting crazy... :-)

Thanks,
Giulio

--
day3.py

goo...@jan-schlueter.de

unread,
Oct 19, 2015, 11:29:36 AM10/19/15
to lasagne-users
Thanks for pointing out. I added the dtype specification in T.mean()
but still I am having a different error.
If I run:

:~$ THEANO_FLAGS=warn_float64='
raise' python day3.py

on the attached file, I have [...]

If I do that on the new file you sent, I again get


Traceback (most recent call last):
  File "day3b.py", line 64, in <module>
    acc = T.mean(eq)

If I add the dtype specification again, it passes again. Groundhog Day, anyone?

Giulio Petrucci

unread,
Oct 19, 2015, 11:34:16 AM10/19/15
to lasagn...@googlegroups.com
Hi Jan,

On Mon, Oct 19, 2015 at 5:29 PM, <goo...@jan-schlueter.de> wrote:
> If I do that on the new file you sent, I again get
>
> Traceback (most recent call last):
> File "day3b.py", line 64, in <module>
> acc = T.mean(eq)

Very strange.

> Groundhog Day, anyone?

Uh?

Ciao,
Giulio

--

goo...@jan-schlueter.de

unread,
Oct 19, 2015, 12:14:27 PM10/19/15
to lasagne-users
Very strange.

If you didn't do so, update to the bleeding-edge version of Theano (and optionally Lasagne):
http://lasagne.readthedocs.org/en/latest/user/installation.html#bleeding-edge-version
That's what I use. Maybe it makes a difference (for Theano, probably not for Lasagne).
 
> Groundhog Day, anyone?  

Uh?

That movie about that guy who relives the same day over and over...

Cheers, Jan

Giulio Petrucci

unread,
Oct 20, 2015, 4:11:36 AM10/20/15
to lasagn...@googlegroups.com
Hi Jan,

On Mon, Oct 19, 2015 at 6:14 PM, <goo...@jan-schlueter.de> wrote:
> If you didn't do so, update to the bleeding-edge version of Theano (and
> optionally Lasagne):
> http://lasagne.readthedocs.org/en/latest/user/installation.html#bleeding-edge-version
> That's what I use. Maybe it makes a difference (for Theano, probably not for
> Lasagne).

Problem solved.
;-)

Ciao and thank you,
Giulio

--

bell...@stethome.com

unread,
Aug 30, 2018, 9:15:58 AM8/30/18
to lasagne-users
Hi there! All links included in this thread are not working.
Some times has passed, but do you still have some working examples, repositories or things like that to share?
I'm interested in implementing Attention in Lasagne for an Polyphonic Event Detection application.

Thanks,
RB
Reply all
Reply to author
Forward
0 new messages