cuda-gpu-available: command not found

764 views
Skip to first unread message

PDX Girl

unread,
Nov 8, 2015, 4:37:33 PM11/8/15
to kaldi-help
Hello,

So I've been trying to use the tedlium recipe on the latest Kaldi trunk.  I finally got pretty far after letting it run for several days, but now get to this point:

steps/nnet/pretrain_dbn.sh : Pre-training Deep Belief Network as a stack of RBMs
         dir       : exp/dnn4_pretrain-dbn
         Train-set : data-fmllr-tri3/train

steps/nnet/pretrain_dbn.sh: line 103: cuda-gpu-available: command not found

Reading through forum posts, it appears as though pretrain_dbn.sh was designed to ONLY work with a GPU, but my understanding is that I can still use the tedlium recipe to use Kaldi without one.  Perhaps I need to disable the deep neural net part?  I'm just trying to get a basic system going for testing, and don't necessarily need deep neural net training yet.  This particular server is a VM with two sockets, each with two virtual processors (four total), 8 GB RAM and 15K SAS disks.  I also have a Standard D14 v2 VM (16 CPU cores, 112 GB RAM, 1 TB SSD) in Azure that I can compile Kaldi on and train, as I have $150/mo in free Azure credits, but can only run it for a few days before hitting the $150 mark and none of their VMs have GPUs anyway. Plus, would the language models compiled thereon be compatible on a different server (though running the same OS version)?

I don't mind training taking several days to a week or two, but I'm thinking I'm making this more complex than necessary.  If I can't (or wouldn't want to) run pretrain_dbn.sh without a GPU, how do I skip the deep neural net part?  I thought it had completed the necessary training for at least SOME usefulness by now, but maybe not.  Down the road I'll be worried more about high accuracy, just trying to get something working for testing at the moment.  Down the road I'll obtain at least a Tesla GPU but don't have one yet.

Any insight that can point me in the right direction?


Thanks!

Rhiannon

PDX Girl

unread,
Nov 8, 2015, 5:20:10 PM11/8/15
to kaldi-help
By the way, I'm seeing at the bottom of
    http://cmusphinx.sourceforge.net/2014/09/lium-releases-tedlium-corpus-version-2/
that it states The problem comes here that quite powerful training clusters will be required to work with such databases, it is not possible to train model on a single server in acceptable amount of time.  I realize that much has been said to this effect, but I had thought tedlium was something that could be experimented with on a single server.  Is this statement true due to the size of the tedlium corpus?  And if so, what would be a recommended setup for reasonable training, and once trained, can this do the actual speech to text decoding on a single, regular server?  I'm wondering which of these options might be better for the training -

  • A small cluster (say, 8) of moderately-powered VMs (say 4 CPUs, 8 GB RAM) in Azure set up with a grid engine
  • A smaller cluster (say, 4) of moderately-powered GPU-based VMs (say 4 CPUs, 8 GB RAM) in Amazon AWS set up with grid engine
  • A smaller cluster (say, 4) of hardware servers (say dual socket, quad core 3 GHz Xeons with 16-32 GB RAM each) with Tesla GPU boards


I can reasonably set those up for training, especially in the cloud as I would only need the VMs for a short time for training purposes.  I just don't want to keep wasting time with what appears to not be working with my 4 CPU, 8 GB RAM VMWare VM that I'm running on premise.  I'm wondering if a Tesla M2090 GPU (6 GB 1.8 GHz memory, 512 1.3 GHz cores) would be sufficient.  If I run with GPUs, I don't think any hypervisors will pass it through so I'd need dedicated hardware so I'm also wondering if I'd be better off using a couple  GPU-based hardware servers vs a larger number of VMs without GPUs (say, 3 hardware GPU-based servers vs 16 quad-core VMs in the public cloud).


I really appreciate your input!



Thanks,


Rhiannon

Daniel Povey

unread,
Nov 8, 2015, 6:44:36 PM11/8/15
to kaldi-help
Your immediate problem may be a path issue or you need to compile.  cuda-gpu-available should be in src/nnetbin/, check that it's there (you may have to recompile).
You can't train the neural net models in a reasonable amount of time on a cluster of CPUs-- at least not with those scripts.  With a cluster of CPUs it is possible to train neural nets using the nnet2 recipes, but not with nnet1 -- and I don't particularly recommend trying this, as it will still be slow.

Regarding GPUs in the cloud, any cloud service that provides GPUs (e.g. ec2) will give you a way to access them from the VM, otherwise they would be useless.  We have used them on ec2.

Dan




--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

PDX Girl

unread,
Nov 10, 2015, 5:03:48 PM11/10/15
to kaldi-help, dpo...@gmail.com
Hi Dan,

I had thought I was using nnet2, didn't catch that I was using nnet1.  What's the best way to change it so it is?  I haven't been able to find what script to modify.

Right, I've been thinking of using one of Amazon's GPU VMs for now, and eventually going to a small hardware cluster with GPU boards, just trying to find a reasonable way of getting things running as quick as possible for testing, and refine it as time goes on.

Thanks!

Daniel Povey

unread,
Nov 10, 2015, 5:05:15 PM11/10/15
to PDX Girl, kaldi-help
The tedlium script may not have an example for nnet2 (or the more recent nnet3)- you'll have to find an example that does, they are usually named local/online/run_nnet2.sh or similar.
Dan

PDX Girl

unread,
Nov 10, 2015, 5:32:55 PM11/10/15
to kaldi-help, rhiann...@risingcables.com, dpo...@gmail.com
Dan,

I will look and see.  Can run_nnet2.sh be run independently?  I was thinking tedlium might be a good recipe to get something basic going for now, but keep finding things pointing me in a million other directions so this might not be the easiest way.

Thanks

Daniel Povey

unread,
Nov 10, 2015, 5:36:39 PM11/10/15
to PDX Girl, kaldi-help

I see that tedlium has such a script:

local/online/run_nnet2_ms.sh 

It looks like you have to run the run.sh at least until the tri3 stage before you can run that.

Dan


PDX Girl

unread,
Nov 10, 2015, 5:45:47 PM11/10/15
to kaldi-help, rhiann...@risingcables.com, dpo...@gmail.com
I'm going to give run_nnet2_ms.sh a shot once the current tedlium run script finishes the tri3 stage.  It's still running - has been for many days - and the last few lines show:

add-self-loops --self-loop-scale=0.1 --reorder=true exp/tri3/final.mdl
steps/make_denlats.sh: feature type is lda
steps/make_denlats.sh: using fMLLR transforms from exp/tri3_ali

So I guess it hasn't finished the tri3 stage.  I'll try this once it looks like it's moved beyond.  I thought a 4 proc VM with 8 GB and 15k SAS drives would move a lot faster, but it sounds like it would never finish running with nnet1. 

What should I look for to know it's completed the tri3 stage?  And should I break out of the running run.sh once it has and run the run_nnet2_ms.sh script?  I initially thought I wouldn't even need to run nnet training just to have something basic going to test with, but maybe I'm not entirely understanding this.  Eventually, yes I want the most accurate language models but I'd like to get something basic working for testing quickly.

Thanks

Daniel Povey

unread,
Nov 10, 2015, 5:52:21 PM11/10/15
to PDX Girl, kaldi-help
It has finished the basic tri3 stage, it is generating denominator lattices for MMI training after tri3.  That is expected to be slow if you don't have a lot of CPUs, and it's not needed anyway for the neural-net training.
So you could stop that and run run_nnet2_ms.sh
You should set the GPUs to exclusive mode using
nvidia-smi -c 3
and limit both --num-jobs-initial and --num-jobs-final to the number of GPUs you have on your machine (e.g. 2 or 4).
If
nvidia-smi
does not tell you that you have GPUs, you have issues with your hardware.  Bear in mind that to use GPUs, you need the right type of virtualization (I think hvm).  Only the right images will give you access to the GPU hardware, and you have to install the drivers.  Unfortunately I don't have time to tell you each step of the installation.
Dan

PDX Girl

unread,
Nov 10, 2015, 6:30:04 PM11/10/15
to kaldi-help, rhiann...@risingcables.com, dpo...@gmail.com
Well, the whole reason I was going to try nnet2 was because this server does not have any GPUs, and you'd mentioned that it would run a bit faster than nnet1 on a non-GPU system.  I will eventually have a GPU server cluster on premise, but I'm not there yet.  I have been planning on building a GPU VM in Amazon for training if I couldn't reasonably do it on prem.  I've also been looking at finding pre-built language models that might work for now.

I'd gladly pay someone to help me get a working Kaldi installation going quickly, allowing me time to learn and optimize it, but haven't had luck in that avenue either.  The project requires that the decoding be done on-prem and not outsourcing to a provider via an API, though that method might be necessary for initial testing until I can get something working.

I had asked previously but didn't get an answer - would a dual socket, quad core 3 GHz Xeon server (non VM) with 32 GB RAM and a lower-end / older GPU like a Tesla M2090 (6 GB, 512 cores) work reasonably well for training?  I might go that route if it will and be done with it.

Thanks

Daniel Povey

unread,
Nov 10, 2015, 6:39:02 PM11/10/15
to PDX Girl, kaldi-help

I really wouldn't recommend doing any nnet training on CPU at this point-- certainly not on a single machine.  Back when we used to do it, we'd use something like 16 machines each with 16 computing threads-- never just one machine.

Well, the whole reason I was going to try nnet2 was because this server does not have any GPUs, and you'd mentioned that it would run a bit faster than nnet1 on a non-GPU system.  I will eventually have a GPU server cluster on premise, but I'm not there yet.  I have been planning on building a GPU VM in Amazon for training if I couldn't reasonably do it on prem.  I've also been looking at finding pre-built language models that might work for now.

I'd gladly pay someone to help me get a working Kaldi installation going quickly, allowing me time to learn and optimize it, but haven't had luck in that avenue either.  The project requires that the decoding be done on-prem and not outsourcing to a provider via an API, though that method might be necessary for initial testing until I can get something working.

I had asked previously but didn't get an answer - would a dual socket, quad core 3 GHz Xeon server (non VM) with 32 GB RAM and a lower-end / older GPU like a Tesla M2090 (6 GB, 512 cores) work reasonably well for training?  I might go that route if it will and be done with it.

It might work OK, but having more than one GPU is more optimal.  I never train using just one GPU.

Normally this type of work is done by people with years of experience in the field and a PhD in speech recognition.  Even just from a system administration point of view it's not trivial, with GPUs involved.

There are people who do this type of consulting work without requiring to use their own platform (e.g. Nagendra Goel; Nickolay Shmyrev; maybe Tony Robinson but I suspect this project might be too small for him), but they are generally pretty busy.

Dan


PDX Girl

unread,
Nov 10, 2015, 6:53:25 PM11/10/15
to kaldi-help, rhiann...@risingcables.com, dpo...@gmail.com
Right, I keep reading everywhere that this is not trivial and the PhD in Speech Recognition is mentioned often.  I've had vast success getting enormously complex IT projects to work over the past few decades, so I've figured I could figure out how to get something working to provide proof of concept and open the door for more funding.  This really isn't a small project, but to get to the point where we can hire a bunch of people dedicated to this (who know it much better than I do), I've got to get something going first.

Thanks for your help

jame...@gmail.com

unread,
Nov 10, 2015, 10:35:04 PM11/10/15
to kaldi-help, rhiann...@risingcables.com, dpo...@gmail.com

I wouldn't normally use this board for a commercial plug, but this is not the first "plea for help" that I've seen in recent months, and in the spirit of offering help...

In addition to the Kaldi consulting groups Dan mentions, I should mention our own company, Cobalt Speech and Language.  We have 10 scientists and engineers with considerable Kaldi knowledge and experience, and are happy to consider consulting jobs of any size.  Contact in...@cobaltspeech.com for more information.

- Jeff Adams

Daniel Povey

unread,
Nov 11, 2015, 1:12:44 AM11/11/15
to jame...@gmail.com, kaldi-help, PDX Girl
Sorry, should have mentioned you too...
Dan

Ms. Rhiannon Ball

unread,
Nov 11, 2015, 1:38:07 PM11/11/15
to jame...@gmail.com, kaldi...@googlegroups.com, kaldi-help
Thank you!

Sent from Outlook

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/l1oLrrV5XkI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages