# Trying the larger dictionary ("big-dict"/bd) + locally produced LM.
utils/mkgraph.sh data/lang_nosp_test_bd_tgpr \
exp/tri3b exp/tri3b/graph_nosp_bd_tgpr || exit 1;
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/b9os_OyHLvc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
local/cstr_wsj_data_prep.sh $corpus
if [ -f data/local/dict/lexiconp.txt ];then
rm data/local/dict/lexiconp.txt
fi
local/wsj_prepare_dict.sh --dict-suffix "_nosp" || exit 1;
utils/prepare_lang.sh data/local/dict_nosp \
"<SPOKEN_NOISE>" data/local/lang_tmp_nosp data/lang_nosp || exit 1;
local/wsj_format_data.sh --lang-suffix "_nosp" || exit 1;
wsj/s5/data> ls
dev_dt_05 lang_nosp lang_nosp_test_bg_5k lang_nosp_test_tg_5k lang_nosp_test_tgpr_5k test_dev93 test_eval92 test_eval93 train_si284
dev_dt_20 lang_nosp_test_bg lang_nosp_test_tg lang_nosp_test_tgpr local test_dev93_5k test_eval92_5k test_eval93_5k
local/wsj_extend_dict.sh --dict-suffix "_nosp" $wsj1/13-32.1
gzip: /13-32.1/wsj1/doc/lng_modl/lm_train/np_data/87/*.z: No such file or directory
gzip: /13-32.1/wsj1/doc/lng_modl/lm_train/np_data/88/*.z: No such file or directory
gzip: /13-32.1/wsj1/doc/lng_modl/lm_train/np_data/89/*.z: No such file or directory
local/wsj_extend_dict.sh --dict-suffix "_nosp" $wsj1
Expecting the argument to this script to end in 13-32.1
utils/mkgraph.sh data/lang_nosp_test_bd_tgpr \
exp/tri3b exp/tri3b/graph_nosp_bd_tgpr
fsttablecompose data/lang_nosp_test_bd_tgpr/L_disambig.fst data/lang_nosp_test_bd_tgpr/G.fst
fstminimizeencoded
fstdeterminizestar --use-log=true
fstisstochastic data/lang_nosp_test_bd_tgpr/tmp/LG.fst
0.000488639 -1.4022
[info]: LG not stochastic.
fstcomposecontext --context-size=3 --central-position=1 --read-disambig-syms=data/lang_nosp_test_bd_tgpr/phones/disambig.int --write-disambig-syms=data/lang_nosp_test_bd_tgpr/tmp/disambig_ilabels_3_1.int data/lang_nosp_test_bd_tgpr/tmp/ilabels_3_1
fstisstochastic data/lang_nosp_test_bd_tgpr/tmp/CLG_3_1.fst
0.000488639 -1.4022
[info]: CLG not stochastic.
make-h-transducer --disambig-syms-out=exp/tri3b/graph_nosp_bd_tgpr/disambig_tid.int --transition-scale=1.0 data/lang_nosp_test_bd_tgpr/tmp/ilabels_3_1 exp/tri3b/tree exp/tri3b/final.mdl
ERROR (make-h-transducer:TopologyForPhone():hmm-topology.cc:279) TopologyForPhone(), phone 88 not covered.
ERROR (make-h-transducer:TopologyForPhone():hmm-topology.cc:279) TopologyForPhone(), phone 88 not covered.
[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::HmmTopology::TopologyForPhone(int) const
kaldi::GetHmmAsFst(std::vector<int, std::allocator<int> >, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::tr1::unordered_map<std::pair<int, std::vector<int, std::allocator<int> > >, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >*, kaldi::HmmCacheHash, std::equal_to<std::pair<int, std::vector<int, std::allocator<int> > > >, std::allocator<std::pair<std::pair<int, std::vector<int, std::allocator<int> > > const, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >*> > >*)
kaldi::GetHTransducer(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::vector<int, std::allocator<int> >*)
make-h-transducer(main+0x383) [0x59bb60]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f3e441c58c5]
make-h-transducer(_start+0x29) [0x59b719]
# sMBR training (1+4 iterations, lattices+alignment updated after 1st iteration)
%WER 9.56 exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats/decode_nosp_bd_tgpr_dev93_iter4/wer_13_0.5
%WER 6.57 exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats/decode_nosp_bd_tgpr_eval92_iter4/wer_12_1.0
# sMBR training (1+4 iterations, lattices+alignment updated after 1st iteration)
%WER 6.15 exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats/decode_bd_tgpr_dev93_iter4/wer_11
%WER 3.56 exp/dnn5b_pretrain-dbn_dnn_smbr_i1lats/decode_bd_tgpr_eval92_iter4/wer_13
np_data/87/w7_001.z
...
np_data/87/w7_126.z
np_data/88/w8_001.z
...
np_data/88/w8_107.z
np_data/89/w9_01.z
...
np_data/89/w9_41.z
>local/wsj_extend_dict.sh --dict-suffix "_nosp" $corpus (Output in output_commands.txt)
>utils/prepare_lang.sh data/local/dict_nosp_larger "<SPOKEN_NOISE>" data/local/lang_tmp_nosp_larger data/lang_nosp_bd
(Output in output_commands.txt)
>local/wsj_train_lms.sh --dict-suffix "_nosp" (Output in wsj_trains_lms.txt)
>local/wsj_format_local_lms.sh --lang-suffix "_nosp" (Output in output_commands.txt)
Yes, I'm using the script local/cstr_wsj_data_prep.sh that deals with old format of the WSJ.
I made all recipe without the great dictionary and I got the following results:
I created a link lang_nosp_test_bg folder to the folder lang_nosp_test_bd_tgpr.
Update- I see in the script that there *is* a script to prepare the big-dict from the older format of WSJ:
# NOTE: If you have a setup corresponding to the older cstr_wsj_data_prep.sh style,
# use local/cstr_wsj_extend_dict.sh --dict-suffix "_nosp" $corpus/wsj1/doc/ instead.