Google colab and kaldi

91 views
Skip to first unread message

Hanna Jasmine

unread,
Apr 5, 2020, 4:27:59 AM4/5/20
to kaldi-help
How to get started with kaldi on google colab? We want to run DNN on GPU. Are there documentation of the same?

Daniel Povey

unread,
Apr 6, 2020, 12:24:41 AM4/6/20
to kaldi-help
Kaldi isn't python-based, so it won't work with colab becaue that's a python environment (AFAIK).


On Sun, Apr 5, 2020 at 4:28 PM Hanna Jasmine <hannaja...@gmail.com> wrote:
How to get started with kaldi on google colab? We want to run DNN on GPU. Are there documentation of the same?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b3776a53-ee50-4287-b94c-b508f7e7197d%40googlegroups.com.

Tim

unread,
Apr 6, 2020, 3:17:02 AM4/6/20
to kaldi-help
I'm sure it's possible on Colab.
Jupyter notebook can run bash or shell scripts using "%%bash" or "!(exclamation mark)".

1. Prepare Kaldi binaries.
 - Compiling Kaldi binaries and copying to cloud service. (Compiling is so slow, and sometimes compiler's killed by system)
2. Set up Kaldi environment.
 - clone the Kaldi repository.
 - download the Kaldi binaries and setup PATH.
3. Copy or download your train/test files or models from your cloud(Drive or Dropbox or your own NAS, etc.).
4. Change your recipe to fit Colab and Save it on permanent storage like Drive or Dropbox.
 - Remove Downloading tasks.
 - Remove CPU jobs.
 - Remove Generating LM Graphs...
5. Run your recipe.

But I don't recommend it for some reason.

Colab just offer 2 xeon cpu core and it's so slow when you execute cpu job. It takes so long time.
So you need to prepare all stuff related CPU task in your own Kaldi recipe on your another machines and upload it your Google Drive.

Colab is really unstable when you execute long code because frontend easily lose a communication with Jupyter kernel occasionally. (Even when you compile Kaldi binaries)
So... frequently there are no output results.
But thankfully you can close your browser when code is executing because of Colab feature different from original Jupyter notebook.
Sometimes Jupyter kernel or VM can be crashed and reattached to your notebook.
If this happens, You need to run it again from beginning.

Maybe Colab Pro feature helps here but I'm not sure.
if you are using Colab free, You can't get a new GPU runtime after every 8~ hours.
Google say you lose your VM even when you use Colab Pro.

A Big problem is Google recommend Drive when using Colab. But actually Google limits accessing to Drive.
For example, Yesterday I was able to extract a archive over 10GiB. Today I couldn't handle archives over 8GiB which are both new uploaded and cloned.
Also, now I couldn't copy Kaldi binaries(~550MiB) into Colab VM.

You better use other clouds or your own NAS to download files into your Colab VM.




2020년 4월 5일 일요일 오후 5시 27분 59초 UTC+9, Hanna Jasmine 님의 말:
Reply all
Reply to author
Forward
0 new messages