Good day,
I have trained a model with a 128 filter ConvLSTM2D layer with 5x5 kernels. The weight shapes returned by the layer are as follows:
1) kernel - (5, 5, 1, 512),
2) recurrent_kernel - (5, 5, 128, 512),
3) bias - (512,).
I am trying to map these values onto the equations below from the original paper. I am having problems finding this correspondence.
Keras docs state that the cell weights (W_ci, W_cf, W_co) are not currently implemented.
I think I have biases figured out: There are 4 bias terms and 128 filters, so 4 x 128 = 512. I suppose that bias[0:3] would provide me the biases for the first filter.
I fail to understand how "kernel" and "recurrent_kernel" weights are mapped though. My thinking is that "kernel" represents W_xi, W_hi, W_xf, and W_hf for 128 filters. This would produce 128 i_t and f_t gates. Then each of these would in turn produce 128 C_t, o_t, and H_ts? That would be consistent with the shape of "recurrent_kernel" (represents W_xc, W_hc, W_xo, W_ho), but that would result in 128 x 128 = 16384 final C_t and H_ts right? And the final shape of the hidden states C_t and H_t are only 128.
Am I missing something here? Is there some step where 16384 hidden states are reduced to the final 128? Any help to clear up this confusion would be greatly appreciated.
Take care and thank you for the assistance,
André