Convert dynamic graph to static graph (Vosk models)

Aleks Navratil

unread,

Aug 9, 2021, 5:15:35 PM8/9/21

to kaldi-help

Main question:

Does anyone know of documentation describing the most expedient method to convert a model's dynamic graph (e.g. a model distributed with HCLr.fst and Gr.fst, but without HCLG.fst) into a static graph (ie. how to obtain HCLG.fst, given those two FST's)?

Example of such a model:

An example of such a dynamic graph/lookahead model is here (note that this model is not distributed with words.txt), and many of the sibling models on the Vosk models page are similar.

Semi-relevant docs:

There is some discussion of how to go in the reverse direction, from static to dynamic, using mkgraph_lookahead.sh. So presumably mkgraph.sh or mkgraph_lookahead.sh are relevant to this problem, but I can't find a description of exactly how.

The help file for mkgraph_lookahead.sh shows a tantalizing message related to --compose-graph, which sounds promising:

root@d1512f83e4a5:/opt/kaldi/egs/wsj/s5/utils# ./mkgraph_lookahead.sh
Usage: ./mkgraph_lookahead.sh [options] <lang-dir> <model-dir> [<arpa_file>] <graphdir>
e.g.: ./mkgraph_lookahead.sh data/lang data/local/lm.gz exp/tri1 db/trigram.lm.gz exp/tri1/lgraph
Options:
... omitting some lines ...
--compose-graph # Compile composed graph for testing with other decoders (default: false)

Unfortunately these Vosk models don't seem to be distributed with anything resembling a lang dir, and when fiddling with mkgraph.sh and mkgraph_lookahead.sh, I end up with errors of the form "expected /workspace/models/vosk-model-small-pt-0.3/words.txt to exist". But of course it's possible I'm just using the wrong syntax.

Motivation/background info:

It seems that dynamic graphs work with only some subset of decoders (and as far as I can tell the relevant-to-me *-cuda2 family of decoders requires a static graph).

nshm...@gmail.com

unread,

Aug 9, 2021, 5:50:41 PM8/9/21

to kaldi-help

To load fst module:

export LD_LIBRARY_PATH=$KALDI_ROOT/tools/openfst/lib/fst

To create composed graph:

kaldi/tools/openfst/bin/fstcompose HCLr.fst Gr.fst > HCLG.fst

To dump words.txt from Gr.fst:

kaldi/tools/openfst/bin/fstsymbols --save_osymbols=words.txt Gr.fst > /dev/null

Aleks Navratil

unread,

Aug 10, 2021, 4:42:27 PM8/10/21

to kaldi-help

Thanks for your fast reply :)

Your instructions definitely make sense; the commands work immediately and emit files of roughly the right size.

Graph/Model mismatch failure:

However, when I start decoding with the newly-built HCLG.fst, it fails immediately like this: "Likely graph/model mismatch (graph built from wrong model?)"

Is it intended that the resulting HCLG.fst will be usable as-is with the out-of-the-box model, for arbitrary decoders? Or maybe more adjustments are necessary?

I tried applying several of the options from fstcompose --help, such as ----compose_filter and --fst_align. Some of these take a surprising amount of RAM but don't seem to help.

fstcompose verbose output:

Here's what fstcompose looks like when running in verbose mode on my inputs (with no extra flags):

/opt/kaldi/tools/openfst/bin/fstcompose --v=100 HCLr.fst Gr.fst > HCLG.fst

INFO: FstImpl::ReadHeader: source: HCLr.fst, fst_type: olabel_lookahead, arc_type: standard, version: 1, flags: 0
INFO: FstImpl::ReadHeader: source: HCLr.fst, fst_type: const, arc_type: standard, version: 2, flags: 0
INFO: memorymap: false source: "HCLr.fst" size: 3273260 offset: 145
INFO: Read 3273260 bytes. 0 remaining
INFO: memorymap: false source: "HCLr.fst" size: 8777040 offset: 3273405
INFO: Read 8777040 bytes. 0 remaining
INFO: FstImpl::ReadHeader: source: Gr.fst, fst_type: ngram, arc_type: standard, version: 4, flags: 3
INFO: ComposeFstImpl: Match type: 3
INFO: # of calls: 3.54821e+07
INFO: # of intervals/call: 18.08

Full error dump and decoder invocation:

Here's an attempt to decode, along with the resulting failure and accompanying stack trace:

root@d1512f83e4a5:/workspace/models/vosk-model-small-pt-0.3# /opt/kaldi/src/cudadecoderbin/batched-wav-nnet3-cuda2 \
--num-channels=300 \
--cuda-use-tensor-cores=true \
--main-q-capacity=30000 \
--aux-q-capacity=400000 \
--cuda-memory-proportion=.5 \
--max-batch-size=200 \
--cuda-worker-threads=16 \
--cuda-decoder-copy-threads=2 \
--frame-subsampling-factor=3 \
--frames-per-chunk=153 \
--max-mem=100000000 \
--beam=10 \
--lattice-beam=7 \
--acoustic-scale=1.0 \
--determinize-lattice=true \
--max-active=10000 \
--iterations=1 \
--file-limit=500 \
--config=$MODELS/$MODEL/online.conf \
/workspace/models/vosk-model-small-pt-0.3/final.mdl \
/workspace/models/vosk-model-small-pt-0.3/HCLG.fst \
scp:/workspace/wav.scp \
'ark:|gzip -c > /workspace/lattice_test.gz'

/opt/kaldi/src/cudadecoderbin/batched-wav-nnet3-cuda2 --num-channels=300 --cuda-use-tensor-cores=true --main-q-capacity=30000 --aux-q-capacity=400000 --cuda-memory-proportion=.5 --max-batch-size=200 --cuda-worker-threads=16 --cuda-decoder-copy-threads=2 --frame-subsampling-factor=3 --frames-per-chunk=153 --max-mem=100000000 --beam=10 --lattice-beam=7 --acoustic-scale=1.0 --determinize-lattice=true --max-active=10000 --iterations=1 --file-limit=500 --config=/workspace/models/vosk-model-small-pt-0.3/online.conf /workspace/models/vosk-model-small-pt-0.3/final.mdl /workspace/models/vosk-model-small-pt-0.3/HCLG.fst scp:/workspace/wav.scp 'ark:|gzip -c > /workspace/lattice_test.gz'

WARNING (batched-wav-nnet3-cuda2[5.5]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode. Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode

LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla T4 free:14989M, used:120M, total:15109M, free/total:0.992058
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.992058
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.992058
LOG (batched-wav-nnet3-cuda2[5.5]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla T4 free:14561M, used:548M, total:15109M, free/total:0.963732 version 7.5
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (batched-wav-nnet3-cuda2[5.5]:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda2[5.5]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

ASSERTION_FAILED (batched-wav-nnet3-cuda2[5.5]:TransitionIdToPdf():hmm/transition-model.h:328) Assertion failed: (static_cast<size_t>(trans_id) < id2pdf_id_.size() && "Likely graph/model mismatch (graph built from wrong model?)")

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7fa32171e183]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)+0x72) [0x7fa32171eb84]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::CudaFst::ApplyTransitionModelOnIlabels(kaldi::TransitionModel const&)+0x73) [0x7fa322f5d3bf]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::CudaFst::Initialize(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, kaldi::TransitionModel const*)+0x9e) [0x7fa322f5e980]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::AllocateAndInitializeData(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&)+0x94f) [0x7fa322f5fdf1]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::Initialize(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&)+0x20) [0x7fa322f6273c]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::BatchedThreadedNnet3CudaOnlinePipeline(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipelineConfig const&, fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, kaldi::nnet3::AmNnetSimple const&, kaldi::TransitionModel const&)+0xa12) [0x7fa322f8cb7a]
/opt/kaldi/src/lib/libkaldi-cudadecoder.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline2::BatchedThreadedNnet3CudaPipeline2(kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline2Config const&, fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, kaldi::nnet3::AmNnetSimple const&, kaldi::TransitionModel const&)+0x48) [0x7fa322f847ae]
/opt/kaldi/src/cudadecoderbin/batched-wav-nnet3-cuda2(main+0xe36) [0x56397b090c23]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa320d3d0b3]
/opt/kaldi/src/cudadecoderbin/batched-wav-nnet3-cuda2(_start+0x2e) [0x56397b08d1ae]

Aborted (core dumped)

nshm...@gmail.com

unread,

Aug 10, 2021, 5:22:25 PM8/10/21

to kaldi-help

I forgot you also need fstrmsymbols and, optionally, fstconvert:

cat HCLG.fst | fstrmsymbols $dir/disambig_tid.int | fstconvert --fst_type=const HCLG_final.fst

Aleks Navratil

unread,

Aug 10, 2021, 6:24:24 PM8/10/21

to kaldi-help

Thank you for the fast and accurate reply :) It works like a charm.

Breadcrumbs for posterity and future web searchers:
Here is an all-in-one place explanation, derived entirely from the thread above, of how to convert a "dynamic graph" Kaldi model (aka "lookahead model" or "dynamic FST model"), such as one found in the Vosk (aka Alphacephei) project into a "static graph Kaldi model" (aka static FST).

This explanation assumes you have already downloaded a dynamic graph model containing both HCLr.fst and Gr.fst, unzipped it into some directory, and that you're now working in that directory. The content here is basically the same as the above; it just fixes a few minor typos in the answers upthread which cause crashes, and expands out all the paths into full absolute paths (to reduce ambiguity).

One-step method, without storing intermediate FST's (try this first):

export LD_LIBRARY_PATH=/opt/kaldi/tools/openfst/lib/fst/

/opt/kaldi/tools/openfst/bin/fstcompose HCLr.fst Gr.fst | /opt/kaldi/src/fstbin/fstrmsymbols disambig_tid.int | /opt/kaldi/tools/openfst/bin/fstconvert --fst_type=const > HCLG.fst

Two-step method, duplicating the results of the method above, but storing intermediate FST on disk for possible inspection or debug:

export LD_LIBRARY_PATH=/opt/kaldi/tools/openfst/lib/fst/

/opt/kaldi/tools/openfst/bin/fstcompose HCLr.fst Gr.fst > HCLG_unfinished.fst

cat HCLG_unfinished.fst | /opt/kaldi/src/fstbin/fstrmsymbols disambig_tid.int | /opt/kaldi/tools/openfst/bin/fstconvert --fst_type=const > HCLG_final.fst

The latter method is basically the same as the former; they differ only in the "> HCLG_unfinished.fst" part.

How to make words.txt:

After doing either or both of the methods above, you'll also want to make a words.txt as follows; this is necessary for getting a human-readable transcript.

/opt/kaldi/tools/openfst/bin/fstsymbols --save_osymbols=words.txt Gr.fst > /dev/null

Caveats:

It's possible that your $KALDI_ROOT might not be the same as mine, e.g. yours is not /opt/kaldi/

This might happen if you're not using the same Dockerized Kaldi as me. But you can probably just find your $KALDI_ROOT and substitute it in the command above.

Reply all

Reply to author

Forward