acoustic scale (acwt)

1,737 views
Skip to first unread message

Ho Yin Chan

unread,
Jul 23, 2015, 11:50:06 AM7/23/15
to kaldi-developers
Hello,

   When combing acoustic score with language model score, since we know generally a LM scales between 11-15 give optimal performance in ASR system, I just wonder why the acoustic scale (acwt) is set to 0.1 (1/10) in the default scripts like make_denlats.sh, train_mpe.sh, train_mmi.sh, train_smbr.sh, decode_nnet.sh, any special reason for that ?

Cheers,

Ricky

Jan Trmal

unread,
Jul 23, 2015, 12:37:27 PM7/23/15
to kaldi-de...@googlegroups.com
I think it was a rather arbitrary decision, just to get the weights into the same ballpark during lattice generation. I don't think setting it to the "correct" weight (figured out during decoding) would change the outcome much.
y. 

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at http://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/a267aa97-c3c8-45ce-9fc2-b13de0ee701d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karel Veselý

unread,
Jul 23, 2015, 1:11:51 PM7/23/15
to kaldi-de...@googlegroups.com
Hi,
it was not arbitrary. The optimal acwt differs from system to system (depends on temporal context on input, number of tied-states).
For DNNs I've seen the optimal LM scales around 10-13. While for monophone DNN it is around 6-8.
The 10 is a safe universal value, which worked well in practice.

For the 'denlats' there is unigram LM, so theoretically the optimal AM/LM scaling can be a bit different here.
(I can imagine the unigram LM scores could be down-scaled, so that even more AM errors get fixed by sMBR)

Since you are interested in the topic I'd suggest you to make few experiments
and verify that it does not lead to significant differences.

Cheers,
Karel.


Dne 23. 7. 2015 v 9:37 Jan Trmal napsal(a):

Gaurav Kumar

unread,
Jul 23, 2015, 1:29:22 PM7/23/15
to kaldi-de...@googlegroups.com, kaldi-developers
I think the reason is that 

AM + alpha * LM has the same behavior as (1/alpha) * AM + LM when using lattice weights. The second case is scaling of weights in the lattice . 


Sent from Mailbox


--

Jan Trmal

unread,
Jul 23, 2015, 1:37:13 PM7/23/15
to kaldi-de...@googlegroups.com
You're right, Gaurav.
I think the original question was, where the number acwt=0.1 i.e. lmwt=10 during lattice generation comes from (as the optimal lmwt, i.e. acwt varies and seldom is 10)  and what I (confusingly) and Karel were explaining is that the number represents reasonable compromise that works (reliably and reasonably well) for all possible  systems because it's not far away from the optimal lmwt/acwt determined later using WER scoring.

y.

Ho Yin Chan

unread,
Jul 24, 2015, 6:06:44 AM7/24/15
to kaldi-de...@googlegroups.com
Thanks.

Optimal LM scales are generally around 12-15 in my GMM/DNN systems. Decoding with "correct" acoustic scale (inverse of LM scale) for lattice generation and then re-scoring usually gives 1%  relative performance gain, compare with using default acwt (0.1) value in the script.

Tuning acoustic scale for lattice generation (and then apply discriminative training) with different values would take too ling time for decoding (and storage space). I would just believe using the default acwt scale does not lead to significant differences.

R


Daniel Povey

unread,
Jul 24, 2015, 3:09:25 PM7/24/15
to kaldi-de...@googlegroups.com
I suspect at least some of the performance gain you get from using a stronger LM scale during lattice generation is because you'll get more things in the beam.  To do a fair comparison you'd have to reduce the beam until the real-time factor is the same as with the default scale of 0.1.
Dan


Karel Veselý

unread,
Jul 24, 2015, 3:10:31 PM7/24/15
to kaldi-de...@googlegroups.com
Ok, this is interesting.
What is the task? Is there some particular difference compared to standard systems?
Also which beams were used? For DNN I usually use --beam=13 --lattice-beam=8.

It can be the case that lower ACWT affects pruning, and similar improvement can
be achieved also with original ACWT 0.1 and larger beams. Can you please try this?

I am a bit cautious about changing the default, as I've seen many DNNs
where the optimal scoring ACWT was 1/10, 1/11. And usually the WER
is quite smooth say in N':(N-2,N+2) at 1/N' around the optimal.

In the same time it is not a problem to change the value,
once we have all the info.

Best,
Karel.


Dne 24. 7. 2015 v 3:06 Ho Yin Chan napsal(a):

Ho Yin Chan

unread,
Jul 24, 2015, 11:32:57 PM7/24/15
to kaldi-de...@googlegroups.com
Perhaps the difference in the optimal acwt scale are due to difference beam width. The decoding beam and lattice beam width I use are 18.0 and 10.0 respectively (on mobile data).  

R

eti...@accuvit.io

unread,
Jul 28, 2015, 2:21:03 PM7/28/15
to kaldi-developers, ricky.ho...@gmail.com
The optimal value is probably different for everyone. We played around with this parameter and 0.07 was optimal and gave us a 5% WER reduction compared to the default (0.1). Our decoding and lattice beams and 12 and 6. 
Reply all
Reply to author
Forward
0 new messages