Clarify weights returned by ConvLSTM2D layer

André Longon

unread,

Apr 8, 2021, 3:00:38 PM4/8/21

to Keras-users

Good day,

I have trained a model with a 128 filter ConvLSTM2D layer with 5x5 kernels. The weight shapes returned by the layer are as follows:

1) kernel - (5, 5, 1, 512),

2) recurrent_kernel - (5, 5, 128, 512),

3) bias - (512,).

I am trying to map these values onto the equations below from the original paper. I am having problems finding this correspondence.

Keras docs state that the cell weights (W_ci, W_cf, W_co) are not currently implemented.

I think I have biases figured out: There are 4 bias terms and 128 filters, so 4 x 128 = 512. I suppose that bias[0:3] would provide me the biases for the first filter.

I fail to understand how "kernel" and "recurrent_kernel" weights are mapped though. My thinking is that "kernel" represents W_xi, W_hi, W_xf, and W_hf for 128 filters. This would produce 128 i_t and f_t gates. Then each of these would in turn produce 128 C_t, o_t, and H_ts? That would be consistent with the shape of "recurrent_kernel" (represents W_xc, W_hc, W_xo, W_ho), but that would result in 128 x 128 = 16384 final C_t and H_ts right? And the final shape of the hidden states C_t and H_t are only 128.

Am I missing something here? Is there some step where 16384 hidden states are reduced to the final 128? Any help to clear up this confusion would be greatly appreciated.

Take care and thank you for the assistance,

André

André Longon

unread,

Apr 8, 2021, 3:24:29 PM4/8/21

to Keras-users

Thought about it some more and it struck me that I am using greyscale images (1 input channel) and that H_t/H_t-1 has 128 channels. So now I think "kernel" represents W_xi, W_xf, W_xc, and W_xo, and "recurrent_kernel" represents W_hi, W_hf, W_hc, and W_ho. So are the convolutions performed with 128 channel hidden state 3D convolutions (5x5x128)?

In any case, I think it'd be useful if this weight mapping would be more visible to the community if it isn't. Is this specified somewhere in the documentation and I just missed it?

André Longon

unread,

Apr 8, 2021, 9:19:32 PM4/8/21

to Keras-users

Okay I believe I have it figured out. I noticed the ConvLSTM2DCell keras source code applies 2d convolutions on the hidden state and weights, so I had to refine my understanding of what a 2D or 3D convolution means. I now understand that even though the hidden state and a given filter are 3D (width, height, channels), the channels dimensions are equal and the kernel size is 2D. I was also getting hung up on thinking that a filter should have one channel and apply to each input channel. If this understanding is correct, then apologies for spamming the board and wasting anyone's time.

Lance Norskog

unread,

Apr 9, 2021, 2:39:18 AM4/9/21

to André Longon, Keras-users

Yes, this stuff hurt my head as well, the first time I went through it. There are some great blog posts with nice graphics showing different convolutions. I don't remember any about ConvLSTM.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/8038944d-57cd-4b0f-8789-919828ac78afn%40googlegroups.com.

--

Lance Norskog
lance....@gmail.com
Redwood City, CA

André Longon

unread,

Apr 9, 2021, 9:03:47 PM4/9/21

to Keras-users

Glad I'm not the only one. I'm kinda just jumping into deep learning development and the ConvLSTM is the best suited layer for the job. Not the best for starting out, but I'm learning a bit. I think something in the documentation about how the weights are organized would be nice. For example, what I mentioned earlier about biases[0:3] being the four biases for the first filter is wrong I think. Based on the Keras source code, it seems to indicate the first filter's four biases would be biases[0], biases[128], biases[256], and biases[384].

Reply all

Reply to author

Forward