Not getting transcription in speaker diarization

79 views

Skip to first unread message

Jaskaran Singh Puri

unread,

May 8, 2019, 10:49:29 AM5/8/19

to kaldi-help

I'm using the M6 model for speaker diarization, I get the following output but not the transcripts along with these segments.

Neither I see any option in the callhome_diarization scripts to configure for any model to get the transcripts

What's the correct way to approach this? Is montreal-forced-aligner good enough to explore

Moreover, these set-ups use acoustic models for transcriptions, can we use Aspire model here?

Please guide

Matthew Maciejewski

unread,

May 9, 2019, 12:47:02 AM5/9/19

to kaldi-help

The diarization task itself does not include transcription, and so our diarization recipes do not include it.

As far as I know, all of the Kaldi ASR recipes can use utterance segmentation as input, which is what the diarization system produces as output. It will require some simple data manipulation to ensure everything is in the right format, however.

The ASpIRE model is probably fine. It depends on what data you are running on. That is true of the diarization system, too, though. For example, the callhome_diarization is all narrowband telephone speech.

It is also perhaps the case that you should not be using diarization at all, or using it in limited cases. We do not have a recipe for multi-speaker ASR transcription, and though using a diarization system as input to an ASR system will work, it will not work as well as a system is built to do both jointly, or is at least tuned for that task. For example, diarization errors that lead to incorrect segments may cause ASR transcription errors that might not arise if diarization were not performed ahead of time. It might be fine, but I still would not expect the end result to be as good as if the system was tuned for this specific task.