CNN-TDNN chain model two heads output

231 views
Skip to first unread message

lucg...@gmail.com

unread,
Mar 20, 2019, 1:55:44 PM3/20/19
to kaldi-help
Hi,

I noticed the CNN-TNN chain model (obtained using "local/chain/tuning/run_cnn_tdnn_1a.sh" script) has two output blocks, "prefinal-chain" and "prefinal-xent", both of them having the same input component, "prefinal-l". The first one is ending with an affine layer, while the second one has a softmax as the final layer. In an older post here [1], I found that there are two output branches, one for training (prefinal-xent) and one for decoding (prefinal-chain).

Can you elaborate a bit how are they used?
Why are two branches required? 
It is a bit unclear for me how the prefinal-chain, which is not ending with a softmax layer, could be used in the decoding step. I thought that decoding supposes to get posterior probabilities for the acoustic states and this thing is provided by the softmax.

Thank you,
Lucian

Daniel Povey

unread,
Mar 20, 2019, 1:59:49 PM3/20/19
to kaldi-help
The prefinal layers don't have any softmax in them, although the xent output layer does have a softmax.

Those layers are just part of the model topology; they just consist of a linear layer with an orthogonal constraint and a smallish output-dim followed by an affine layer then relu and batchnorm.  The reason for separating them is just that empirically it worked better.  Only the output layers would ever be used by larger parts of the program.  Actually the decoding only uses the output called 'output'.

Because chain models are trained with a sequence objective, the output layer called 'output' does not need softmax.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ff4b2ffd-1784-400b-9365-abd061ba70c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lucg...@gmail.com

unread,
Mar 20, 2019, 2:05:48 PM3/20/19
to kaldi-help
"The prefinal layers don't have any softmax in them, although the xent output layer does have a softmax."

Yeah, that is true, I wanted to say about the "output.affine" vs. "output-xent.log-softmax", the first one being the final layer in the "prefinal-chain" block and the second one is the "prefinal-xent".
Reply all
Reply to author
Forward
0 new messages