The diarization task itself does not include transcription, and so our diarization recipes do not include it.
As far as I know, all of the Kaldi ASR recipes can use utterance segmentation as input, which is what the diarization system produces as output. It will require some simple data manipulation to ensure everything is in the right format, however.
The ASpIRE model is probably fine. It depends on what data you are running on. That is true of the diarization system, too, though. For example, the callhome_diarization is all narrowband telephone speech.
It is also perhaps the case that you should not be using diarization at all, or using it in limited cases. We do not have a recipe for multi-speaker ASR transcription, and though using a diarization system as input to an ASR system will work, it will not work as well as a system is built to do both jointly, or is at least tuned for that task. For example, diarization errors that lead to incorrect segments may cause ASR transcription errors that might not arise if diarization were not performed ahead of time. It might be fine, but I still would not expect the end result to be as good as if the system was tuned for this specific task.
—Matt