RTF in lm rescoring

709 views
Skip to first unread message

Armando

unread,
Apr 4, 2016, 3:09:41 AM4/4/16
to kaldi-help
Hi all

I tried to rescore with a 3 gram lm the lattices produced by a Kaldi decoding based on a 2-gram HCLG fst. I used mode 4 which is said to be exact. I then used the mbr decoding to produce the one-best; I did this for the lattices produced by the decoder itself and the lattices produced by 3gram rescoring.
WER improvement is remarkable; but the processing time looks very high for the lm rescoring. It is about 1 RT; which is a value not that far from the RT of the decoding itslef, where the search space is obviously much larger.
Is that the expected processing time or am I missing something, like some parameter tuning?
that line below is the command pipe; what is the task that takes the longest to be completed?

gunzip -c exp/trisatmmi/decode/lat.1.gz | lattice-scale --lm-scale=0.0 ark:- ark:- | lattice-to-phone-lattice exp/trisatmmi/final.mdl ark:- ark:- | lattice-compose ark:- exp/trisatmmi/decode/decode_lmrescore_3g//Ldet.fst ark:- | lattice-determinize --max-mem=50000000 ark:- ark:- | lattice-compose --phi-label=45669 ark:- data/eval_3g//G.fst ark:- | lattice-add-trans-probs --transition-scale=1.0 --self-loop-scale=0.1 exp/trisatmmi/final.mdl ark:- ark:- | gzip -c >exp/trisatmmi/decode/decode_lmrescore_3g//lat.1.gz


thanks in advance

Daniel Povey

unread,
Apr 4, 2016, 12:20:46 PM4/4/16
to kaldi-help
It could be slow because you have bigger than normal lattices.. but I think the 'const-arpa' rescoring is going to be faster than this.  Look for steps/lmrescore_const_arpa.sh in the example scripts.
I don't know which stage is slow for sure (I expect lattice-compose with G.fst), but you could tell from 'top' while running it.
Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Armando

unread,
Apr 5, 2016, 10:05:50 AM4/5/16
to kaldi-help, dpo...@gmail.com
Hi

I decoded in bigram about 10h of speech to get about 160 MB of lattice in gzip. The rescoring in 3 gram produced 2.2 GB of lattices and witjh mbr and lm_weight 14 I got a 22% relative improvement on WER (very high!) in 1 RT (very slow for linguistic lattice rescoring)

On the other end I tried what you suggested by using steps/lmrescore_const_arpa.sh after the generation of the G.carpa
Initially, I just launched the script with the parameters hardcoded there; that means
lattice-lmrescore-const-arpa --lm-scale=1.0 ark:- data/eval_3g//G.carpa 'ark,t:|gzip -c>exp/trisatmmi/decode/decode_lmrescore_3g_constarpa//lat.1.gz'
lattice-lmrescore --lm-scale=-1.0 'ark:gunzip -c exp/trisatmmi/decode/lat.1.gz|' 'fstproject --project_output=true data/eval/G.fst |' ark:-

RT was about 0.15 but results were terrible

so I launched that same rescoring with lm_scale=0.0 as in the original rescoring by steps/lmrescore.sh and RT=0.003 (extremely fast) and WER relative improvement of 15.7%
all the one best are obtained with mbr and lm_weight = 14

Looks like the parameter lm_scale in lattice-lmrescore is very important, both for performance and processing time. I thought it was just the inverse of the acoustic scale of the decoder, in my case = 1/12 = 0.083333 (which looks the case for mbr, where values of lm_weight are in that range) but clearly it's not. Do you have some tips or thumb rules about it?

thanks as alaways for the relentless work on kaldi and the support

Armando

unread,
Apr 5, 2016, 12:48:15 PM4/5/16
to kaldi-help, dpo...@gmail.com
No, I retreat the previous mail; indeed putting lm_scale to 0.0 does not apply at all the 3-gram lm scores; the reason why I thought to observe a WER improvement is that the initial WER from the bigram decoding was in reality better than the one I was comparing to; there was no improvement by rescoring this way as it should be.
For now, I have only observed that
steps/lmrescore_const_arpa.sh
has degraded the results. I'll check if everything correct in the generation of the G.carpa, perhaps

Daniel Povey

unread,
Apr 5, 2016, 12:53:39 PM4/5/16
to Armando, kaldi-help
Make sure the vocabulary is not mismatched.  I am flying today-> no time to respond further.
Dan

Armando

unread,
Apr 11, 2016, 1:34:21 PM4/11/16
to kaldi-help, armando.m...@gmail.com, dpo...@gmail.com
Hi
I thought to try out all different modes in lm_rescore

                       WER       lattice/size    RTF
mode1              38.6         167 MB         0.32

mode2              38.6         143 MB         0.31

mode3              38.6         137 MB         0.27

mode4              39.7          2.2 GB         ~1

the first 3 modes gives the same results in terms of WER, with comparable size of 3gram-rescored lattices and processing time; mode 4 is the worst performing both for WER and processing time

since the command line of mode 1 of lmrescore looks about the same in lmrescore_const_arpa.sh I thought to benefit in terms of processing time using the latter while keeping the same performance; anyway I just cannot make it work, WER go up to 70%, like a big mistake occurs somewhere, like a mismatch. Though, I don't see where it would occur; it seems to me the only thing to do is the generation of G.carpa
there cannot be a vocabulary mismatch, as the old_lang directory is copied into the new one.
G.carpa warns that about 11 millions of OOV were found and not included in G.carpa, that is words that are in the LM but not in the vocabulary, but those OOVs simply correspond to n-grams with the unk symbol, that is not in the lexicon;  the unk symbol was used also during the generation of the bigram LM used for the HCLG decoding, hence there's no difference in this regard between the HCLG decoding and the G.carpa lm rescoring
Beside the generation of the G.carpa, I don't see which other computation could generate a problem


this the command to generate G.carpa
./utils/build_const_arpa_lm.sh ~/resources/lm/lm.tg.arpa data/eval data/eval_3g/

that's the command line for lmrescore_const_arpa.sh

lattice-lmrescore --lm-scale=-1.0 'ark:gunzip -c exp/trisatmmi/decode/lat.1.gz|' 'fstproject --project_output=true data/eval/G.fst |' ark:-
lattice-lmrescore-const-arpa --lm-scale=1.0 ark:- data/eval_3g//G.carpa 'ark,t:|gzip -c>exp/trisatmmi/decode/decode_lmrescore_3g_constarpa//lat.1.gz'

I don't really have warnings or log to print


armando

Daniel Povey

unread,
Apr 11, 2016, 1:56:15 PM4/11/16
to Armando, kaldi-help
That is very odd.  Make sure your graph is not older than your lang directory.  And run all 4 modes of rescoring with rescoring to your original (bigram) language model- which should make no difference- and show me the WERs.
Also see if you can see any systematic differences in the decoded output in the e.g. mode-1 vs. mode-4 rescoring (you may need to use int2sym.pl -f 2- data/lang_something/words.txt something.tra to view the words).  
Dan

Armando

unread,
Apr 13, 2016, 6:45:37 AM4/13/16
to kaldi-help, armando.m...@gmail.com, dpo...@gmail.com


On Monday, April 11, 2016 at 7:56:15 PM UTC+2, Dan Povey wrote:
That is very odd.  Make sure your graph is not older than your lang directory. 

I checked; it's not older

 
And run all 4 modes of rescoring with rescoring to your original (bigram) language model- which should make no difference- and show me the WERs.


WER for first 3 modes is 42.9 wheras the bigram decoding HCLG yielded 43.0, so it stays the same. mode 4 is still running, but I can see by producing 1-best of partial lattices, that results are effectively very similar to the other modes, I'd expect WER to be the same or maybe slightly worse for reasons I'll explain below.
On the other hand, like in 3-gram rescoring, processing time is slow and lattices much bigger than the input lattices.
 
Also see if you can see any systematic differences in the decoded output in the e.g. mode-1 vs. mode-4 rescoring (you may need to use int2sym.pl -f 2- data/lang_something/words.txt something.tra to view the words).  
Dan



I was looking at the alignment between hypotheses and references: it looks to me that the first 3 modes produce basically identical outputs, hence the same WER. There are, though, some (slight) differences with the mode 4 that probably explain what looked like a worse WER.
What I consistently observe is that mode 4 produces output like:
"J' ai"  or "Je t' appelle"   (which respectively means in French "I have" or "I call you")
while the first three modes generate usually something like:
"Je ai" and "Je tu appelle"
they are the same expression, no big deal, but it's definitely more correct in French to utter like in the first way (it's an obvious contraction of Je and tu in J' and t' because of the following vowel); the strange thing is that the manual transcriber has consistently generated the reference in the second way, so mode4 is looking worse in those case, that are relatively frequent, while it should look better IMO
also, I see some consistent difference  in some "hesitation" processing; since it's spontaneous telephonic conversation that I am decoding, so there are frequent expression like "uhm...ben...." ("ben" could be used by French as English could use "uhm...well..")
now, in mode4 I see it more frequently than in all the other modes, and most of the time it is not correct, in the sense that "ben" has not really been uttered, but still there some kind of hesitation sound coming from the speaker, whihc is usually not  transcribed at all by the other modes; so it's really the language model doing something there cause acoustically there cannot be a good score for "ben".
Beside that, I can see rare differences for other words (and I must say, my perception is that mode4 is doing a bit better) but consistent differences are only those mentioned above.
In general, it looks to me that mode3 is doing the same as mode 1 and 2; mode 4 does not look worse, in reality, but processing time is not reasonable.
Actually my bigger concern as of now, is to make lm_rescore_const_arpa work because it looks like mode 1 and should be faster and memory efficient. There's definitely no vocabulary mismatch, because I can see that many words are correctly recognized, though the word insertion is very high: I have a total of 145215 words in the hyp, and only 100191 in the ref, whereas for the other modes with lm_rescore I have a more reasonable 103K

Indeed, that's what NIST ctm scoring gives me for mode1 lm_rescore and lm_rescore_const_arpa respectively
                Corr    Sub    Del    Ins    Err
const        63.3   32.8    3.9   33.7   70.4
mode1      68.8   20.7   10.5    7.4   38.6


Joanna Równicka

unread,
Nov 22, 2017, 6:09:29 AM11/22/17
to kaldi-help
Hi.

I found a similar behaviour when rescoring the lattices with a 4-gram LM, namely the difference in WERs for the modes 1-3 and the mode 4, although in my case mode 4 is the best one (almost 5% relative improvement!). I was using steps/lmrescore.sh. The WERs for my experiments are:

mode 1    33.0%
mode 2    33.0%
mode 3    33.1%
mode 4    31.6%

I was also looking at the decoded outputs from all of the modes but I didn't notice any systematic differences. Did you find out what was the reason for those discrepancies?

Joanna

Armando

unread,
Nov 22, 2017, 6:16:17 AM11/22/17
to kaldi-help
I'm embarrassed to say, but the reason was just me not having updated to the latest version; that exchange continued privately, and  rather wasted Dan time, given that have the code up to date should be a must
btw, have you tried rescoring with const arpa?
it's the recommended approach nowadays

bye

Daniel Povey

unread,
Nov 22, 2017, 12:25:26 PM11/22/17
to kaldi-help
What's different about mode 4 from the others is that instead of rescoring by subtracting the old LM scores, it does it by reconstructing all the score from scratch.  So if you provided to the script an <old-lang-dir> which did not match the one that you originally used to create the graph that you decoded with, mode 4 would be the only one would be correct.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/68a30256-9f2c-4c74-a1f1-187f57ecbe11%40googlegroups.com.

Joanna Równicka

unread,
Nov 24, 2017, 8:04:01 AM11/24/17
to kaldi-help
Yes, this was the case. Thank you.

Joanna
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages