Re: [kaldi-help] tdnn3 model training problem

382 views
Skip to first unread message

Daniel Povey

unread,
Jul 22, 2021, 9:20:22 AM7/22/21
to kaldi...@googlegroups.com
responded in another thread; that script seems to deal poorly with what happens if you retrain the base system.  we will fix it.


On Thursday, July 22, 2021, Sergio Ornaque <sergio.orna...@gmail.com> wrote:
I'm trying to run the script mini_librispeech -> s5 -> run.sh to train a model and familiarize myself with the script, but it fails on the last step while training the model using the script local/chain2/run_tdnn.sh

I'm getting the following error:
run.pl: job failed, log is in exp/chain2/tdnn1a_sp/den_fsts/log/make_den_fst.log

This is the error log 'make_den_fst.log':
Number of states and arcs in phone-LM FST is 6342 and 41194
Number of states and arcs in context-dependent LM FST is 6342 and 41194
ERROR TransitionModel::TupleToTransitionState, tuple not found. (incompatible tree and model?)

The only thing I've changed is 'queue.pl' to 'run.pl' in the cmd.sh file.

Any help is appreciated

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/fd2a0953-7d29-4a6b-ad40-e24c52b73f92n%40googlegroups.com.

Gwel DG

unread,
Dec 1, 2021, 7:53:12 AM12/1/21
to kaldi-help
Hi,

I stumbled on the same problem while running mini_librispeech recipe on my own tiny dataset, with "--use-gpu no" option set (for steps/chain2/train.sh script) and with --nj set to 1 everywhere.
Any progress on that front ?

(Sorry, I couldn't find the other thread you mentioned)

Thanks !

Srikanth R Madikeri

unread,
Dec 1, 2021, 10:26:40 AM12/1/21
to kaldi...@googlegroups.com
Hello,

Is the problem that you are not able to rerun the script, or that it doesn't run with run.pl?

Srikanth

Gwel DG

unread,
Dec 3, 2021, 4:55:14 PM12/3/21
to kaldi-help
Hi,

The script runs smoothly up to the chain model training stage (last stage in run.sh).
It fails in "run_tdnn.sh" script ("run_tdnn_nogpu.sh" is modified with "--use-gpu no" option set) at stage 16 :

local/chain2/run_tdnn_nogpu.sh: creating denominator FST

run.pl: job failed, log is in exp/chain2/tdnn1a_sp/den_fsts/log/make_den_fst.log

Here's the error in exp/chain2/tdnn1a_sp/den_fsts/log/make_den_fst.log :

LOG (chain-make-den-fst[5.5.990~1-6e03a]:CreateDenominatorFst():chain-den-graph.cc:306) Number of states and arcs in phone-LM FST is 3730 and 12551
LOG (chain-make-den-fst[5.5.990~1-6e03a]:CreateDenominatorFst():chain-den-graph.cc:335) Number of states and arcs in context-dependent LM FST is 3730 and 12551
ERROR (chain-make-den-fst[5.5.990~1-6e03a]:TupleToTransitionState():transition-model.cc:262) TransitionModel::TupleToTransitionState, tuple not found. (incompatible tree and model?)

It seems like the exact same problem Sergio Ornaque had at the beginning of this thread.

Thanks !

Daniel Povey

unread,
Dec 4, 2021, 11:35:11 PM12/4/21
to kaldi-help
Usually this would be some issue where you re-ran an earlier stage, overwriting something a later stage was using, like alignments, without re-running some intermediate stage.
So look at file times and the "--stage" parameter/variable.


Gwel DG

unread,
Dec 14, 2021, 9:35:40 AM12/14/21
to kaldi-help

Thank you Dan, that helped !

I could go one stage further after cleaning all the intermediate data manually. I thought the script did that already at stage 0 but then noticed that I commented that line out an forgot about it...
Reply all
Reply to author
Forward
0 new messages