CNN-TDNN chain model two heads output

Skip to first unread message

Mar 20, 2019, 1:55:44 PM3/20/19
to kaldi-help

I noticed the CNN-TNN chain model (obtained using "local/chain/tuning/" script) has two output blocks, "prefinal-chain" and "prefinal-xent", both of them having the same input component, "prefinal-l". The first one is ending with an affine layer, while the second one has a softmax as the final layer. In an older post here [1], I found that there are two output branches, one for training (prefinal-xent) and one for decoding (prefinal-chain).

Can you elaborate a bit how are they used?
Why are two branches required? 
It is a bit unclear for me how the prefinal-chain, which is not ending with a softmax layer, could be used in the decoding step. I thought that decoding supposes to get posterior probabilities for the acoustic states and this thing is provided by the softmax.

Thank you,

Daniel Povey

Mar 20, 2019, 1:59:49 PM3/20/19
to kaldi-help
The prefinal layers don't have any softmax in them, although the xent output layer does have a softmax.

Those layers are just part of the model topology; they just consist of a linear layer with an orthogonal constraint and a smallish output-dim followed by an affine layer then relu and batchnorm.  The reason for separating them is just that empirically it worked better.  Only the output layers would ever be used by larger parts of the program.  Actually the decoding only uses the output called 'output'.

Because chain models are trained with a sequence objective, the output layer called 'output' does not need softmax.


Go to find out how to join
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
To view this discussion on the web visit
For more options, visit

Mar 20, 2019, 2:05:48 PM3/20/19
to kaldi-help
"The prefinal layers don't have any softmax in them, although the xent output layer does have a softmax."

Yeah, that is true, I wanted to say about the "output.affine" vs. "output-xent.log-softmax", the first one being the final layer in the "prefinal-chain" block and the second one is the "prefinal-xent".
Reply all
Reply to author
0 new messages