How to offline decoding with nnet1.

276 views
Skip to first unread message

shun

unread,
Jul 23, 2019, 2:04:36 AM7/23/19
to kaldi-help
Dear all,

I have been playing Kaldi for 4 months and are familiar with creating DNN acoustic models (nnet1) using "kaldi/egs/csj/s5/run.sh".

I started with no knowledge of speech recognition, but I am grateful that I could learn various things from Kaldi. Thank you very much.

Now, I want to build an offline speech recognition system using Kaldi.

Please tell me how to decode a new wav file using the nnet1 model (* .nnet etc.) you have created.

In addition, the offline speech recognition system we are thinking of is considering the following procedure.

・1.: Record the audio file you want to decode (*. wav).
・2.: 1. The audio file (*. wav) recorded in is decoded with Kaldi(* .nnet etc.).
・3.: 2. Receive the recognition result (text) decoded by.

Thank you and best regards,

Shun

●Supplementary Information(version etc...,)
 ・【OS】
   Ubuntu16.04LTS 

 ・【recipe】
   Corpus of Spontaneous Japanese(CSJ)
   (kaldi/egs/csj/s5)

 ・【toolkit for language models】
   IRSTLM

Daniel Povey

unread,
Jul 23, 2019, 4:45:10 PM7/23/19
to kaldi-help
You can in general figure it out from looking at the decoding log files and tracing back to see where various things came from and what commands the associated log files have.  You'd generally replace utt2spk or spk2utt files with a dummy that says just
dummy dummy
or something like that, so `dummy` would be the utterance-id and also the speaker-id.
If it's one of the nnet1 scripts that uses fMLLR adaptation, it may be a little complicated-- there are various stages in
fMLLR estimation.
You'd do better to use nnet3 scripts and use the online decoding setup, where it really is just a single binary.
Look at mini_librispeech/s5/local/chain/run_tdnn.sh for an up-to-date setup.
The results will be better, also.
Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4a6d9fee-84d1-4d52-bbad-1a09c7fcff79%40googlegroups.com.

shun

unread,
Jul 23, 2019, 9:50:00 PM7/23/19
to kaldi-help
Dear Dan-san,

Thank you for your prompt response.

I understood that offline speech recognition can be complicated with nnet1.

There was an image that online was more difficult than offline, so
I was thinking of implementing online recognition as the next step of offline.

I understood that the nnet3 model has better results than the nnet1 model.

I also want to use it as a speech recognition engine for robots in the future, so
I would like to create an nnet3 model and try online speech recognition.

If I have any problems, I might have to contact you again. Thank you in advance.

Best regards,

Shun

2019年7月24日水曜日 5時45分10秒 UTC+9 Dan Povey:
You can in general figure it out from looking at the decoding log files and tracing back to see where various things came from and what commands the associated log files have.  You'd generally replace utt2spk or spk2utt files with a dummy that says just
dummy dummy
or something like that, so `dummy` would be the utterance-id and also the speaker-id.
If it's one of the nnet1 scripts that uses fMLLR adaptation, it may be a little complicated-- there are various stages in
fMLLR estimation.
You'd do better to use nnet3 scripts and use the online decoding setup, where it really is just a single binary.
Look at mini_librispeech/s5/local/chain/run_tdnn.sh for an up-to-date setup.
The results will be better, also.
Dan

On Mon, Jul 22, 2019 at 11:04 PM shun <yoshi.s...@gmail.com> wrote:
Dear all,

I have been playing Kaldi for 4 months and are familiar with creating DNN acoustic models (nnet1) using "kaldi/egs/csj/s5/run.sh".

I started with no knowledge of speech recognition, but I am grateful that I could learn various things from Kaldi. Thank you very much.

Now, I want to build an offline speech recognition system using Kaldi.

Please tell me how to decode a new wav file using the nnet1 model (* .nnet etc.) you have created.

In addition, the offline speech recognition system we are thinking of is considering the following procedure.

・1.: Record the audio file you want to decode (*. wav).
・2.: 1. The audio file (*. wav) recorded in is decoded with Kaldi(* .nnet etc.).
・3.: 2. Receive the recognition result (text) decoded by.

Thank you and best regards,

Shun

●Supplementary Information(version etc...,)
 ・【OS】
   Ubuntu16.04LTS 

 ・【recipe】
   Corpus of Spontaneous Japanese(CSJ)
   (kaldi/egs/csj/s5)

 ・【toolkit for language models】
   IRSTLM

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

shun

unread,
Aug 7, 2019, 8:57:21 PM8/7/19
to kaldi-help
Dear Dan-san,

As you told me, the nnet3 model is currently being created with "local / chain / run_tdnn.sh"

Is it easy to do the following when creating the nnet3 model?
I would like to evaluate the recognition accuracy of already recorded Japanese audio files.

● wav file recognition
・1.: Record the audio file you want to decode (*. wav).
  → For example, input.wav (record "おはよう")
・2.: 1. The audio file (*. wav) recorded in is decoded with Kaldi(* .nnet etc.).
  → For example, ****.sh ***.nnet input.wav  output.txt
・3.: 2. Receive the recognition result (text) decoded by.
  → For example, output.txt ( output "おはよう")

Thank you and best regards,

Shun

2019年7月24日水曜日 5時45分10秒 UTC+9 Dan Povey:
You can in general figure it out from looking at the decoding log files and tracing back to see where various things came from and what commands the associated log files have.  You'd generally replace utt2spk or spk2utt files with a dummy that says just
dummy dummy
or something like that, so `dummy` would be the utterance-id and also the speaker-id.
If it's one of the nnet1 scripts that uses fMLLR adaptation, it may be a little complicated-- there are various stages in
fMLLR estimation.
You'd do better to use nnet3 scripts and use the online decoding setup, where it really is just a single binary.
Look at mini_librispeech/s5/local/chain/run_tdnn.sh for an up-to-date setup.
The results will be better, also.
Dan
On Mon, Jul 22, 2019 at 11:04 PM shun <yoshi.s...@gmail.com> wrote:
Dear all,

I have been playing Kaldi for 4 months and are familiar with creating DNN acoustic models (nnet1) using "kaldi/egs/csj/s5/run.sh".

I started with no knowledge of speech recognition, but I am grateful that I could learn various things from Kaldi. Thank you very much.

Now, I want to build an offline speech recognition system using Kaldi.

Please tell me how to decode a new wav file using the nnet1 model (* .nnet etc.) you have created.

In addition, the offline speech recognition system we are thinking of is considering the following procedure.

・1.: Record the audio file you want to decode (*. wav).
・2.: 1. The audio file (*. wav) recorded in is decoded with Kaldi(* .nnet etc.).
・3.: 2. Receive the recognition result (text) decoded by.

Thank you and best regards,

Shun

●Supplementary Information(version etc...,)
 ・【OS】
   Ubuntu16.04LTS 

 ・【recipe】
   Corpus of Spontaneous Japanese(CSJ)
   (kaldi/egs/csj/s5)

 ・【toolkit for language models】
   IRSTLM

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Aug 7, 2019, 10:16:09 PM8/7/19
to kaldi-help
That would be easiest to do with the online-decoding binary, since that takes in the wav files.  Look at the last
stage in typical chain training scripts where it mentions 'online'.   Look at the command line in one of those decode logs; you'd
just have to modify it to load only one wav file.  Once you understand Kaldi's I/O mechanisms it's fairly easy... basically,
 whatever rspecifier reads in the wav files would be replaced with "scp:echo foo /my/dir/wav1.scp|"


Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3b44fb0d-d41f-4a07-b0cf-2643c796e108%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages