Regarding RNNLM training script

1,085 views
Skip to first unread message

Alim Misbullah

unread,
Jul 22, 2018, 8:53:07 AM7/22/18
to kaldi-help
Hi,

My *.counts file produce too large integer number because my training data is about 300 million.

How to deal with this issue?

num_splits=$(rnnlm/get_num_splits.sh 5000000 data/rnnlm/text_zh-hant-gc exp/rnnlm_zh-hant-gc_lstm_tdnn_bs_1a/config/data_weights.txt)
rnnlm/get_num_splits.sh: line 69: [: 4.20253e+09: integer expression expected
rnnlm/get_num_splits.sh: there were no counts in counts file data/rnnlm/text_zh-hant-gc/gc.counts

Thanks for suggestion.

Best,
Alim

Daniel Povey

unread,
Jul 22, 2018, 1:36:31 PM7/22/18
to kaldi-help
Can you figure out what program or script or inline command originally produced the 4.20e+09?  we can change how the formatting is done.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/518a5b1c-398f-4790-9925-329f35e946d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alim Misbullah

unread,
Jul 22, 2018, 5:15:12 PM7/22/18
to kaldi-help
Hi,

Thanks for response.

I found it in rnnlm/get_num_splits.sh inline 68-69

for f in $text/*.counts; do
   if [ "$f" != "$text/dev.counts" ]; then
     this_tot=$(cat $f | awk '{tot += $2} END{print tot}')
     if ! [ $this_tot -gt 0 ]; then
       echo "$0: there were no counts in counts file $f" 1>&2
       exit 1
     fi
     # weight by the data multiplicity which is the second field of the weights file.
     multiplicity=$(basename $f | sed 's:.counts$::' | utils/apply_map.pl $multiplicities)
     if ! [ $multiplicity -eq $multiplicity ]; then
       echo "$0: error getting multiplicity for data-source $f, check weights file $weights_file"
       exit 1
     fi
     tot_orig=$[tot_orig+this_tot]
     tot_with_multiplicities=$[tot_with_multiplicities+(this_tot*multiplicity)]
  fi

done

Best,
Alim

Daniel Povey

unread,
Jul 22, 2018, 5:45:31 PM7/22/18
to kaldi-help
see if changing
 print tot
to
printf("%d", tot)
fixes it, and please make a PR if it does.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Jul 23, 2018, 9:35:35 PM7/23/18
to kaldi-help
Hi,

I also found that when I train RNNLM model, the memory increase significantly when doing rnnlm-compute-prob.

The new process of rnnlm-compute-prob will not wait for previous rnnlm-compute-prob to be finished.

When I check the process using "htop", there are too many rnnlm-compute-prob for different model RNNLM model need to be finished.

How to solve this issue?

So that, the process will not consume a lot of memory.

Best,
Alim


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
rnnlm-compute-prob.png

Daniel Povey

unread,
Jul 23, 2018, 9:43:16 PM7/23/18
to kaldi-help
You could maybe modify train_rnnlm.sh to not compute diagnostics on every iteration.
Those scripts are really intended to work with a proper queue manager like GridEngine.  run.pl is just a kind of hack to get it to work when you don't have that.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Jul 23, 2018, 9:53:10 PM7/23/18
to kaldi-help
Thanks for suggestion.

I also have GridEngine in my server, but it does not work for rnnlm training. 

I don't have idea why it cannot work for RNNLM training, but it work well for acoustic model training.

Best,
Alim

Daniel Povey

unread,
Jul 23, 2018, 9:56:41 PM7/23/18
to kaldi-help
OK.  "does not work" is kind of a vague problem.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Jul 25, 2018, 3:48:26 PM7/25/18
to kaldi-help
Hi,

If I have new word in dictionary and do first pass decoding.

In fact, we need to re-train RNNLM model, otherwise the words.txt will not match.

How do we handle this case without re-train RNNLM when we have new word in dictionary.

This case may happened for Mandarin dictionary.

Thanks,
Alim

Daniel Povey

unread,
Jul 25, 2018, 3:58:25 PM7/25/18
to kaldi-help, Hainan Xu, Xiaohui Zhang
If the RNNLM was trained with letter-based features, it is possible to extend the vocabulary without retraining the RNNLM.  Hainan or Samuel, is there an example script for this?
I don't know how well this would work for Mandarin though.

Dan


Alim

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Aug 4, 2018, 10:00:29 AM8/4/18
to kaldi-help
Hi,

Currently, I can use RNNLM for rescoring and get impressive result.

I tried to intregrate the function that use for RNNLM rescoring from src/latbin/lattice-lmrescore-kaldi-rnnlm-pruned.cc into our own decoder.

The idea is to do online decoding and rescoring then we can get the output after RNNLM rescoring directly.

The code can be compiled without any error, but I got Segmentation Fault when run the decoder.

I use gdb to debug, then I got the following sign:

[New Thread 0x7fffedffb700 (LWP 12911)]

Thread 258 "dnn_batch" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffedffb700 (LWP 12911)]
0x00007ffff78a8ad7 in ATL_sdot_xp1yp1aXbX () from src/mykaldi/build_master/libDavinciCsr.so

I type "bt", the get the following sign:

#0  0x00007ffff78a8ad7 in ATL_sdot_xp1yp1aXbX () from src/mykaldi/build_master/libDavinciCsr.so
#1  0x00007ffff79ed08b in kaldi::rnnlm::RnnlmComputeState::LogProbOfWord(int) const ()
   from src/mykaldi/build_master/libDavinciCsr.so
#2  0x00007ffff79ee879 in kaldi::rnnlm::KaldiRnnlmDeterministicFst::GetArc(int, int, fst::ArcTpl<fst::TropicalWeightTpl<float> >*) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#3  0x00007ffff73be2d7 in fst::ScaleDeterministicOnDemandFst::GetArc(int, int, fst::ArcTpl<fst::TropicalWeightTpl<float> >*) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#4  0x00007ffff740c2a7 in fst::ComposeDeterministicOnDemandFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::GetArc(int, int, fst::ArcTpl<fst::TropicalWeightTpl<float> >*) () from src/mykaldi/build_master/libDavinciCsr.so
#5  0x00007ffff777c340 in kaldi::PrunedCompactLatticeComposer::ProcessTransition(int, int) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#6  0x00007ffff777cfa8 in kaldi::PrunedCompactLatticeComposer::ProcessQueueElement(int) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#7  0x00007ffff777dd0a in kaldi::PrunedCompactLatticeComposer::Compose() ()
   from src/mykaldi/build_master/libDavinciCsr.so
#8  0x00007ffff777e050 in kaldi::ComposeCompactLatticePruned(kaldi::ComposeLatticePrunedOptions const&, fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > > const&, fst::DeterministicOnDemandFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >*, fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >*) () from src/mykaldi/build_master/libDavinciCsr.so
#9  0x00007ffff741c273 in RNNLM_Rescore(fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >, fst::VectorFst<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, fst::VectorState<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> >, std::allocator<fst::ArcTpl<fst::CompactLatticeWeightTpl<fst::LatticeWeightTpl<float>, int> > > > >&, kaldi::rnnlm::KaldiRnnlmDeterministicFst*, fst::ScaleDeterministicOnDemandFst*, kaldi::ComposeLatticePrunedOptions, float, float, pthread_mutex_t*) () from /var/speech/DavinciCSR/src/mykaldi/build_master/libDavinciCsr.so
#10 0x00007ffff741d9c1 in DNN_PostSearch(void*, char const*, long*, bool) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#11 0x00007ffff736376d in kaldi_dnn_post_search(void*, char const*, long*, bool) ()
   from src/mykaldi/build_master/libDavinciCsr.so
#12 0x0000000000414a6c in ConcreteThread::Process() ()
#13 0x0000000000409c91 in AbstractThread::CallThreadFunc(void*) ()
#14 0x00007ffff70aa6ba in start_thread (arg=0x7fffedffb700) at pthread_create.c:333
#15 0x00007ffff653f41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Any suggestion?

Thanks,
Alim


Pada Kamis, 26 Juli 2018 03.58.25 UTC+8, Dan Povey menulis:
If the RNNLM was trained with letter-based features, it is possible to extend the vocabulary without retraining the RNNLM.  Hainan or Samuel, is there an example script for this?
I don't know how well this would work for Mandarin though.

Dan

On Wed, Jul 25, 2018 at 12:48 PM, Alim Misbullah <misb...@gmail.com> wrote:
Hi,

If I have new word in dictionary and do first pass decoding.

In fact, we need to re-train RNNLM model, otherwise the words.txt will not match.

How do we handle this case without re-train RNNLM when we have new word in dictionary.

This case may happened for Mandarin dictionary.

Thanks,
Alim

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Aug 4, 2018, 1:24:22 PM8/4/18
to kaldi-help
Looks to me like the word index might be out of range; if you compile with -g you should be able to figure out what it is.
E.g. maybe a mismatch in words.txt between the RNNLM and the first-pass decoding.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Aug 4, 2018, 3:55:22 PM8/4/18
to kaldi-help
Hi, 

I can run original code from kaldi, I mean lattice-lmrescore-kaldi-rnnlm-pruned without any problem with words.txt.

I think I need to check how I defined CuMatrix to get word_embedding.final.mat.

Thanks a lot for the clue.

Alim


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/ZCtpKef0Nts/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Alim Misbullah

unread,
Aug 8, 2018, 6:54:34 AM8/8/18
to kaldi-help
Hi,

I solved the issue that I found in last thread.

Now, 

I have question about TDNNF, 

Can I use the TDNNF network structure to train RNNLM model?

If yes, how the structure will be look like?

I mean the network.config.

Thanks.

Alim
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/ZCtpKef0Nts/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

Daniel Povey

unread,
Aug 8, 2018, 3:44:33 PM8/8/18
to kaldi-help
You can never use a model that has right-context in an RNNLM, or it
would see the future.
So you can't use TDNN-F directly, it would require modification.
>>>> https://groups.google.com/d/msgid/kaldi-help/a613f299-7035-4331-ad8e-590d5284914a%40googlegroups.com.
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> Go to http://kaldi-asr.org/forums.html find out how to join
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "kaldi-help" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/kaldi-help/ZCtpKef0Nts/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> kaldi-help+...@googlegroups.com.
>>> To post to this group, send email to kaldi...@googlegroups.com.
>>> To view this discussion on the web visit
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/30321153-31b4-4ddd-b7c2-604ca5523581%40googlegroups.com.

Simon Vandieken

unread,
Sep 23, 2020, 11:57:40 PM9/23/20
to kaldi-help
Hi everyone,

Sorry to unearth this old thread but I'm having a similar problem while trying to implement the pruned RNNLM rescoring.
Everything works fine using the Kaldi scripts and binaries but my adaptation into other adapted code segfaults in kaldi::rnnlm::RnnlmComputeState::LogProbOfWord(int).
I would appreciate it if the original poster Alim could tell me how they solved the issue.

Best regards,
Simon

Daniel Povey

unread,
Sep 24, 2020, 4:56:55 AM9/24/20
to kaldi-help
I doubt it's the same issue.  I suspect you are using it in a way the interface doesn't allow, e.g. in terms of multiple threads or using non-defined constructors or assignment operators or maybe just supplying an out-of-range integer.

Reply all
Reply to author
Forward
0 new messages