Kaldi CPU Only

Gisel Caicedo

unread,

Sep 30, 2024, 3:44:18 PM9/30/24

to kaldi-developers

Hi all, I have a doubt if I could implement this toolkit only with cpu, not only for training. I am currently working with Whisper for my transcriptions, but due to the GPU consumption it is not very optimal for large volumes of transcriptions in the future.

I know that this model works mainly on GPU, but I would like to know how effective it could be on CPU alone compared to Whisper.

Daniel Povey

unread,

Oct 3, 2024, 7:14:17 AM10/3/24

to kaldi-de...@googlegroups.com

if you mean for inference there are many other models (not whisper) which are very efficient even on CPU.

whisper has an auto-recurrence that makes it slow.

in the "sherpa" project (see k2-fsa/sherpa on github) we have some support for inference of our own models and also

various public model. But there are other solutions for this e.g. speechbrain, huggingface.

On Tue, Oct 1, 2024 at 3:44 AM Gisel Caicedo <giselcai...@gmail.com> wrote:

Hi all, I have a doubt if I could implement this toolkit only with cpu, not only for training. I am currently working with Whisper for my transcriptions, but due to the GPU consumption it is not very optimal for large volumes of transcriptions in the future.

I know that this model works mainly on GPU, but I would like to know how effective it could be on CPU alone compared to Whisper.

--
visit http://kaldi-asr.org/forums.html to find out how to join.
---
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/0409d330-8fb4-417c-8345-f23dbe95627dn%40googlegroups.com.

Gisel Caicedo

unread,

Oct 3, 2024, 10:35:24 AM10/3/24

to kaldi-developers

Thank you for your response. As for CPU-only inference times when processing large volumes of data, then kaldi could be out of the question? according to the model options you give me.

Desh Raj

unread,

Oct 4, 2024, 1:41:39 PM10/4/24

to kaldi-de...@googlegroups.com

You can check out the distil-whisper model on faster-whisper (

https://github.com/SYSTRAN/faster-whisper). Distil-whisper has only 2 decoder layers so it's much faster for ASR than whisper, and the faster-whisper implementation uses things like 8-bit quantization to make it even faster.

Desh

On Mon, Sep 30, 2024 at 3:44 PM Gisel Caicedo <giselcai...@gmail.com> wrote:

Hi all, I have a doubt if I could implement this toolkit only with cpu, not only for training. I am currently working with Whisper for my transcriptions, but due to the GPU consumption it is not very optimal for large volumes of transcriptions in the future.

I know that this model works mainly on GPU, but I would like to know how effective it could be on CPU alone compared to Whisper.

--

Reply all

Reply to author

Forward