acoustic scale (acwt)

Ho Yin Chan

unread,

Jul 23, 2015, 11:50:06 AM7/23/15

to kaldi-developers

Hello,

When combing acoustic score with language model score, since we know generally a LM scales between 11-15 give optimal performance in ASR system, I just wonder why the acoustic scale (acwt) is set to 0.1 (1/10) in the default scripts like make_denlats.sh, train_mpe.sh, train_mmi.sh, train_smbr.sh, decode_nnet.sh, any special reason for that ?

Cheers,

Ricky

Jan Trmal

unread,

Jul 23, 2015, 12:37:27 PM7/23/15

to kaldi-de...@googlegroups.com

I think it was a rather arbitrary decision, just to get the weights into the same ballpark during lattice generation. I don't think setting it to the "correct" weight (figured out during decoding) would change the outcome much.

y.

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at http://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/a267aa97-c3c8-45ce-9fc2-b13de0ee701d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karel Veselý

unread,

Jul 23, 2015, 1:11:51 PM7/23/15

to kaldi-de...@googlegroups.com

Hi,
it was not arbitrary. The optimal acwt differs from system to system (depends on temporal context on input, number of tied-states).
For DNNs I've seen the optimal LM scales around 10-13. While for monophone DNN it is around 6-8.
The 10 is a safe universal value, which worked well in practice.

For the 'denlats' there is unigram LM, so theoretically the optimal AM/LM scaling can be a bit different here.
(I can imagine the unigram LM scores could be down-scaled, so that even more AM errors get fixed by sMBR)

Since you are interested in the topic I'd suggest you to make few experiments
and verify that it does not lead to significant differences.

Cheers,
Karel.

Dne 23. 7. 2015 v 9:37 Jan Trmal napsal(a):

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAFReZQYTmjdJ6FS9YXotujbNSXh27t1UGr3uz_ipgzqAUas%2BzA%40mail.gmail.com.

Gaurav Kumar

unread,

Jul 23, 2015, 1:29:22 PM7/23/15

to kaldi-de...@googlegroups.com, kaldi-developers

I think the reason is that

AM + alpha * LM has the same behavior as (1/alpha) * AM + LM when using lattice weights. The second case is scaling of weights in the lattice .

—
Sent from Mailbox

--

Jan Trmal

unread,

Jul 23, 2015, 1:37:13 PM7/23/15

to kaldi-de...@googlegroups.com

You're right, Gaurav.
I think the original question was, where the number acwt=0.1 i.e. lmwt=10 during lattice generation comes from (as the optimal lmwt, i.e. acwt varies and seldom is 10) and what I (confusingly) and Karel were explaining is that the number represents reasonable compromise that works (reliably and reasonably well) for all possible systems because it's not far away from the optimal lmwt/acwt determined later using WER scoring.

y.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/1437672560002.c77c071f%40Nodemailer.

Ho Yin Chan

unread,

Jul 24, 2015, 6:06:44 AM7/24/15

to kaldi-de...@googlegroups.com

Thanks.

Optimal LM scales are generally around 12-15 in my GMM/DNN systems. Decoding with "correct" acoustic scale (inverse of LM scale) for lattice generation and then re-scoring usually gives 1% relative performance gain, compare with using default acwt (0.1) value in the script.

Tuning acoustic scale for lattice generation (and then apply discriminative training) with different values would take too ling time for decoding (and storage space). I would just believe using the default acwt scale does not lead to significant differences.

R

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/55B12055.4080006%40gmail.com.

Daniel Povey

unread,

Jul 24, 2015, 3:09:25 PM7/24/15

to kaldi-de...@googlegroups.com

I suspect at least some of the performance gain you get from using a stronger LM scale during lattice generation is because you'll get more things in the beam. To do a fair comparison you'd have to reduce the beam until the real-time factor is the same as with the default scale of 0.1.

Dan

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAN3rRoaTYScgMU-%2BQ_DRK0rm7EnqDEbBo%2B3Y3w29kDS0N%3DdWLw%40mail.gmail.com.

Karel Veselý

unread,

Jul 24, 2015, 3:10:31 PM7/24/15

to kaldi-de...@googlegroups.com

Ok, this is interesting.
What is the task? Is there some particular difference compared to standard systems?
Also which beams were used? For DNN I usually use --beam=13 --lattice-beam=8.

It can be the case that lower ACWT affects pruning, and similar improvement can
be achieved also with original ACWT 0.1 and larger beams. Can you please try this?

I am a bit cautious about changing the default, as I've seen many DNNs
where the optimal scoring ACWT was 1/10, 1/11. And usually the WER
is quite smooth say in N':(N-2,N+2) at 1/N' around the optimal.

In the same time it is not a problem to change the value,
once we have all the info.

Best,
Karel.

Dne 24. 7. 2015 v 3:06 Ho Yin Chan napsal(a):

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAN3rRoaTYScgMU-%2BQ_DRK0rm7EnqDEbBo%2B3Y3w29kDS0N%3DdWLw%40mail.gmail.com.

Ho Yin Chan

unread,

Jul 24, 2015, 11:32:57 PM7/24/15

to kaldi-de...@googlegroups.com

Perhaps the difference in the optimal acwt scale are due to difference beam width. The decoding beam and lattice beam width I use are 18.0 and 10.0 respectively (on mobile data).

R

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/55B28DA6.5060007%40gmail.com.

eti...@accuvit.io

unread,

Jul 28, 2015, 2:21:03 PM7/28/15

to kaldi-developers, ricky.ho...@gmail.com

The optimal value is probably different for everyone. We played around with this parameter and 0.07 was optimal and gave us a 5% WER reduction compared to the default (0.1). Our decoding and lattice beams and 12 and 6.

Reply all

Reply to author

Forward