Hello,
I was wondering about the optimization strategies in Kaldi for CE and output L2 regularization, say for a Librispeech recipe. In principle, CE loss should encourage the numerator of the LF-MMI loss, and output L2 regularization should prevent extremely peaky posterior distributions, so avoid overfitting? Could you share what strategy you followed for optimizing the Librispeech recipe? Can the Librispeech recipe be further optimized by tuning the regularization CE and L2 scaling factors?
Out of curiosity, could the LF-MMI loss become negative when the numerator (likelihood) becomes extremely large? The numerator hypothesis is included in the denominator but with much smaller probability along the n-gram LM, I assume...
Best,
- Marc