Hello,
Based on the online2bin/online2-wav-nnet3-latgen-incremental.cc examples, I am trying to do inference on a raw nnet3 VAD model, to select feature vectors to be fed to the online decoder. I am using a separate DecodableNnetLoopedOnline object with
// at initiatlization time...
// load nnet3 into vad_net
vad_decodable_opts.frame_subsampling_factor = 1 ;
vad_decodable_opts.acoustic_scale = 1.0 ;
vad_decodable_opts.frames_per_chunk = same_as_decoder_decodable_object ;
vad_decodable_info = new DecodableNnetSimpleLoopedInfo(vad_decodable_opts, vad_nnet) ;
// create decodeable object, with FBANK features from feature_pipeline
vad_decodable = new DecodableNnetLoopedOnline(*vad_decodable_info, feature_pipeline->InputFeature(), NULL) ;
//compute feature vectors for current chunk
feature_pipeline->AcceptWaveform(samp_freq, audio_data) ;
.
.
.
BaseFloat *vad_output_data = vad_output.Data() ;
vad_output_data[0] = vad_decodable->LogLikelihood(i, 1) ;
vad_output_data[1] = vad_decodable->LogLikelihood(i, 2) ;
These three lines above never compute any output, with NumFramesReady being always 0.
My guess is that this DNN has no transition model inside, so maybe I can not use LogLikelihood to access the pseudo-likelihoods or posteriors. This network is based on the egs/sad_rats recipe.
Indeed, is there a simple way to compute the raw VAD scores for each of the feature_pipeline feature vectors?
Thank you for any ideas into this...
Best,
Marc