Dear David,
Thank you so much for your answer. I was looking at those recipes, but the point is that I think that they are prepared for one-to-one speaker verification, or gender identification at utterance level. That is, in these recipes there is not included a process to segment an audio signal and then check if each of the segments is one thing or other (a type of diarization process). Moreover, these scripts should be run before doing the recognition in a separate process. We would like to obtain this information during decoding.
Some weeks ago, we downloaded an acoustic model from
http://kaldi-asr.org/downloads/all/egs/fisher_english/s5/exp/nnet2_onlineand testing it we realized that this model included information about [noise], [laughter]... during the recognition process. Do you know who could be the main author of these English models trained over Fisher corpus? May be, this person could have the key.
Anyway, many thanks for your support.