very big memory usage in mkgraph_lookahead

73 views
Skip to first unread message

Yangang Cao

unread,
Jan 25, 2024, 6:35:58 AMJan 25
to kaldi-help
config:
lm=tgmed
am=exp/nnet3_cleaned/tdnn_sp (I trained a nnet3 model to calculate GOP, less than 100MB, no ivector)
testset=test_clean

when I ran ./local/lookahead/run_lookahead_no_ivector.sh (I want to get grammar limited ASR like Vosk and calculate GOP at same time, I am not sure of theoretical feasibility, if not true, please tell me, thanks!),  I stuck in utils/mkgraph_lookahead.sh, cost more than 450GB memory in server, and killed.

Is it possible to consume this much memory if the program is running properly?

Yangang Cao

unread,
Jan 25, 2024, 6:44:41 AMJan 25
to kaldi-help
I use openfst1.7.2,  I stuck in  fstdeterminizestar in every utils/mkgraph_lookahead.sh in /local/lookahead/run_lookahead_no_ivector.sh

Yangang Cao

unread,
Jan 25, 2024, 6:57:10 AMJan 25
to kaldi-help
When I change am=exp/nnet3_cleaned/tdnn_sp to am=exp/chain_cleaned/tdnn_1d_sp, I get HCLr.fst and Gr.fst normally. But chain model was bad at GOP, any idea to fstdeterminizestar nnet3 model?

Yangang Cao

unread,
Mar 19, 2024, 4:30:11 AMMar 19
to kaldi-help
I found the problem is in  src/fstext/determinize-star-inl.h
void Determinize(bool *debug_ptr) {
assert(!determinized_);
// This determinizes the input fst but leaves it in the "special format"
// in "output_arcs_".
InputStateId start_id = ifst_->Start();
if (start_id == kNoStateId) { determinized_ = true; return; } // Nothing to do.
else { // Insert start state into hash and queue.
Element elem;
elem.state = start_id;
elem.weight = Weight::One();
elem.string = repository_.IdOfEmpty(); // Id of empty sequence.
std::vector<Element> vec;
vec.push_back(elem);
OutputStateId cur_id = SubsetToStateId(vec);
assert(cur_id == 0 && "Do not call Determinize twice.");
}
while (!Q_.empty()) {
std::pair<std::vector<Element>*, OutputStateId> cur_pair = Q_.front();
Q_.pop_front();
ProcessSubset(cur_pair);
if (debug_ptr && *debug_ptr) Debug(); // will exit.
if (max_states_ > 0 && output_arcs_.size() > max_states_) {
if (allow_partial_ == false) {
KALDI_ERR << "Determinization aborted since passed " << max_states_
<< " states";
} else {
KALDI_WARN << "Determinization terminated since passed " << max_states_
<< " states, partial results will be generated";
is_partial_ = true;
break;
}
}
}
determinized_ = true;
}
Q_.pop_front() pop a element every loop, but Q_.size() didn't change, so an endless loop appear.

I'm not sure if all nnet3(not chain) models are theoretically impossible for lookahead, any one have tips?
Reply all
Reply to author
Forward
0 new messages