Hi
If I understand correctly, when we use unchain_backward(), we are using the following paradigm, right? (graphic from the blog post)
The unchaining would occur right when we compute the "final state." Then, after unchaining, we use the previous state and continue computing with the RNN.
The blog post mentions this other approach:
My question is: is there a clean way of implementing these BPTT variations in Chainer?
Thanks!