Arabic ASR development

521 views
Skip to first unread message

vinithaba...@gmail.com

unread,
Sep 15, 2018, 7:25:30 AM9/15/18
to kaldi-help
Hi all,
                     How to generate arabic lexicon to build my arabic ASR.Can anyone suggest me if any tool is there to generate the lexicon.I got the data from the following link http://www.openslr.org/46/ .



Thanks in Advance


Regards,
Vinitha

John Morgan

unread,
Sep 15, 2018, 10:28:50 AM9/15/18
to kaldi...@googlegroups.com
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c79e37f6-153b-4590-8f82-a85221326794%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,
Sep 15, 2018, 11:46:15 AM9/15/18
to kaldi-help
It'a also common to build Arabic systems based on 'graphemic' lexicons, meaning you just treat the letters as words.  Of course that means it has to guess at the short vowels, but Arabic text (e.g. as used for language modeling) doesn't typically have the short-vowel information anyway, so in practice it kind of has to guess even when you have a lexicon, because a particular word has many different vowelizations.

Dan

vinithaba...@gmail.com

unread,
Sep 15, 2018, 12:11:04 PM9/15/18
to kaldi-help
Hi Dan,
    For example this is a arabic word     مرحبا how to generate the phonetic transcription of the word?Because the link from where I got the data dosent't contain lexicon in it.If it is possible can you please share the file if it is available with you.

Thanks in Advance


Regards
Vinitha

Daniel Povey

unread,
Sep 15, 2018, 12:29:48 PM9/15/18
to kaldi-help
There is a .bz2 file there (you have to unzip it with bunzip2), and it seems to contain a Buckwalter-encoded Arabic lexicon:


Some lines near the end:

zyz z i y z

zyz z i y z a

zyz z i y z i n


Look up the Buckwalter encoding.



John Morgan

unread,
Sep 15, 2018, 12:37:12 PM9/15/18
to kaldi...@googlegroups.com
I am working on a recipe that uses the Tunisian Accented MSA corpus.
You can clone the repo at:

I convert the buckwalter encoding to utf8 with a perl module.

J

vinitha baskaran

unread,
Sep 17, 2018, 7:06:04 AM9/17/18
to kaldi...@googlegroups.com
Hi John , I have downloaded the recipe from the above link.While running the run.sh I am getting the below errors.Is there any changes to be made in run.sh.
local/prepare_data.sh: looking for wav files for devtest/CTELLONE/Recordings_Arabic/6.
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/devtest/CTELLONE/Recordings_Arabic/6': No such file or directory
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/transcripts/devtest/recordings.tsv at local/devtest_recordings_make_lists.pl line 25.
cat: data/local/tmp/tunis/devtest/CTELLONE/Recordings_Arabic/6/wav.scp: No such file or directory
cat: data/local/tmp/tunis/devtest/CTELLONE/Recordings_Arabic/6/utt2spk: No such file or directory
cat: data/local/tmp/tunis/devtest/CTELLONE/Recordings_Arabic/6/text: No such file or directory
local/prepare_data.sh: looking for wav files for devtest/CTELLTHREE/Recordings_Arabic/10.
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/devtest/CTELLTHREE/Recordings_Arabic/10': No such file or directory
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/transcripts/devtest/recordings.tsv at local/devtest_recordings_make_lists.pl line 25.
cat: data/local/tmp/tunis/devtest/CTELLTHREE/Recordings_Arabic/10/wav.scp: No such file or directory
cat: data/local/tmp/tunis/devtest/CTELLTHREE/Recordings_Arabic/10/utt2spk: No such file or directory
cat: data/local/tmp/tunis/devtest/CTELLTHREE/Recordings_Arabic/10/text: No such file or directory
fix_data_dir.sh: no utterances remained: not proceeding further.
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLONE': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLTWO': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLTHREE': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLFOUR': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLFIVE': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLONE': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLTWO': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLTHREE': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLFOUR': No such file or directory
find: '/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/train/CTELLFIVE': No such file or directory
Can't open /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/transcripts/train/answers.tsv: No such file or directory at local/answers_make_lists.pl line 36.
fix_data_dir.sh: no utterances remained: not proceeding further.
Can't open /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/transcripts/train/recordings.tsv: No such file or directory at local/recordings_make_lists.pl line 36.
fix_data_dir.sh: no utterances remained: not proceeding further.
fix_data_dir.sh: no utterances remained: not proceeding further.
fix_data_dir.sh: no utterances remained: not proceeding further.
find: ‘/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/cls’: No such file or directory
local/prepare_data.sh: making recordings list for cls
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/cls/data/transcripts/recordings/cls_recordings.tsv at local/test_recordings_make_lists.pl line 25.
find: ‘/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/lfi’: No such file or directory
local/prepare_data.sh: making recordings list for lfi
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/lfi/data/transcripts/recordings/lfi_recordings.tsv at local/test_recordings_make_lists.pl line 25.
find: ‘/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/srj’: No such file or directory
local/prepare_data.sh: making recordings list for srj
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/Libyan_MSA/srj/data/transcripts/recordings/srj_recordings.tsv at local/test_recordings_make_lists.pl line 25.
find: ‘/home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/speech/test/mbt’: No such file or directory
local/prepare_data.sh: making recordings list for mbt
problems with /home/hduser1/kaldi/egs/tunisian_msa-master/s5/Tunisian_MSA/data/transcripts/test/mbt/recordings/mbt_recordings.tsv at local/test_recordings_make_lists.pl line 25.
cat: data/local/tmp/libyan/cls/recordings/wav.scp: No such file or directory
cat: data/local/tmp/libyan/cls/recordings/utt2spk: No such file or directory
cat: data/local/tmp/libyan/cls/recordings/text: No such file or directory
cat: data/local/tmp/libyan/lfi/recordings/wav.scp: No such file or directory
cat: data/local/tmp/libyan/lfi/recordings/utt2spk: No such file or directory
cat: data/local/tmp/libyan/lfi/recordings/text: No such file or directory
cat: data/local/tmp/libyan/srj/recordings/wav.scp: No such file or directory
cat: data/local/tmp/libyan/srj/recordings/utt2spk: No such file or directory
cat: data/local/tmp/libyan/srj/recordings/text: No such file or directory
cat: data/local/tmp/tunis/mbt/recordings/wav.scp: No such file or directory
cat: data/local/tmp/tunis/mbt/recordings/utt2spk: No such file or directory
cat: data/local/tmp/tunis/mbt/recordings/text: No such file or directory
fix_data_dir.sh: no utterances remained: not proceeding further.




Thanks in Advance
Regards,
Vinitha B

John Morgan

unread,
Sep 17, 2018, 9:21:59 AM9/17/18
to kaldi...@googlegroups.com
Vinitha,
Did the data download scripts run?
I put them in stage -1.
You could try:
run.sh --stage -1
Or you could try running the download scripts separately:
local/tamsa_download.sh http://www.openslr.org/resources/46/Tunisian_MSA.tar.gz
Similarly for qcri and subs.
J
>>>>> <https://groups.google.com/d/msgid/kaldi-help/c79e37f6-153b-4590-8f82-a85221326794%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Go to http://kaldi-asr.org/forums.html find out how to join
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "kaldi-help" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to kaldi-help+...@googlegroups.com.
>>>>> To post to this group, send email to kaldi...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/kaldi-help/5BCCC7C8-ED17-4441-85C6-5CDC91086CEC%40gmail.com
>>>>> <https://groups.google.com/d/msgid/kaldi-help/5BCCC7C8-ED17-4441-85C6-5CDC91086CEC%40gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>> --
>>> Go to http://kaldi-asr.org/forums.html find out how to join
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups
>>> "kaldi-help" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> email to kaldi-help+...@googlegroups.com.
>>> To post to this group, send email to kaldi...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/kaldi-help/d2187680-782d-4735-b578-f3350151794e%40googlegroups.com
>>> <https://groups.google.com/d/msgid/kaldi-help/d2187680-782d-4735-b578-f3350151794e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> Go to http://kaldi-asr.org/forums.html find out how to join
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "kaldi-help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kaldi-help+...@googlegroups.com.
>> To post to this group, send email to kaldi...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kaldi-help/CAEWAuySyejOha%3DzWwLVq%3D%2BZ0WFpnBcR9vT_rM3UbxTEKPPOv1A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/kaldi-help/CAEWAuySyejOha%3DzWwLVq%3D%2BZ0WFpnBcR9vT_rM3UbxTEKPPOv1A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> Go to http://kaldi-asr.org/forums.html find out how to join
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "kaldi-help" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kaldi-help+...@googlegroups.com.
>> To post to this group, send email to kaldi...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kaldi-help/8540105F-B424-4E99-83C6-F337D2A2FE5A%40gmail.com
>> <https://groups.google.com/d/msgid/kaldi-help/8540105F-B424-4E99-83C6-F337D2A2FE5A%40gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/CAAOB9m4atwCNfjOhnh1eqMyDaBCXSqCXiVYn-YYLKPk0jfNaRA%40mail.gmail.com.

vinitha baskaran

unread,
Sep 17, 2018, 12:45:33 PM9/17/18
to kaldi...@googlegroups.com
Hi Jhon , the data download script is not running.



Thanks in Advance
Regards 


VINITHA B

entn-at

unread,
Sep 17, 2018, 12:51:19 PM9/17/18
to kaldi-help
I think the problem is that the data on OpenSLR does not include the Libyan test data and that the CELL{ONE,TWO,THREE,FOUR,FIVE} data is organized in a different way to local/prepare_data.sh expects it (e.g., https://github.com/johnjosephmorgan/tunisian_msa/blob/cd30faf2e582a2cbe46b0ab41ed8540a2cda290a/s5/local/prepare_data.sh#L64 looks for CELL* in $data_dir/speech/train, but there is no train directory).

If you like I can create a PR with a version of prepare_data.sh that works with the data provided in Tunisian_MSA.tar.gz.

Daniel Povey

unread,
Sep 17, 2018, 12:53:45 PM9/17/18
to kaldi-help
Let's wait till we hear from John.  Possibly the wrong version of the data was uploaded there-- if so we should change it.


John Morgan

unread,
Sep 17, 2018, 3:48:06 PM9/17/18
to kaldi...@googlegroups.com
Yes, the problem is the data on openslr is an older version of the corpus.
I put a newer version on the CLSP cluster at:
/export/a05/jjm/tunisianmsa.tar

I am assuming Dan or Yenda has to move that archive to openslr.

If not, do you want me to put it there somehow?
J
>> <https://groups.google.com/d/msgid/kaldi-help/3a97cd66-bb78-4c90-944c-58c911c27417%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/CAEWAuyT3jH3mEOPz9F5tPj27Ri%3DYEaZMxGPiUh41afE9-NU6Fw%40mail.gmail.com.

Daniel Povey

unread,
Sep 17, 2018, 3:57:48 PM9/17/18
to kaldi-help
John, I'm wondering whether we should make a new filename on openslr.  Were there previous working recipes given the old format?  What I'm getting at is, will it inconvenience anyone if we overwrite the old data?

John Morgan

unread,
Sep 17, 2018, 6:48:01 PM9/17/18
to kaldi...@googlegroups.com
I would be surprised if anyone had a recipe.
What is involved?
Do you mean just storing the old archive and the new one under the same resource?
That sounds like a good idea.
John


vinithaba...@gmail.com

unread,
Sep 17, 2018, 11:40:20 PM9/17/18
to kaldi-help
Hi all, So from where I can get the newer version of the data?

Daniel Povey

unread,
Sep 18, 2018, 1:52:35 AM9/18/18
to kaldi-help, John Morgan
I am uploading the data to here
under a separate filename (_v2), but it's not fully transferred yet.
John, please let me know how the new version differs, and also please make a PR so the recipe will get the correct location.  The new version is Tunisian_MSA_v2.tar.gz instead of Tunisian_MSA.tar.gz.  However, if there's no reason why someone would want the old version, and you think it won't be disruptive, I can change this and just overwrite the old one.  What I'm doing is just provisional.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

John Morgan

unread,
Sep 18, 2018, 9:13:03 AM9/18/18
to dpo...@gmail.com, kaldi-help
After looking closer at the old version and the recipe I had written,
it does not make sense to keep the old version.
The old version only had training data.
The new version has a new level of directories for detest train and
test splits.
I'll work on a pr.
Thanks,
John
>> <https://groups.google.com/d/msgid/kaldi-help/cc54cbdd-4178-4ad4-a9a5-b901cd666747%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .

Ou Ou Khin

unread,
Sep 18, 2018, 9:25:24 AM9/18/18
to kaldi...@googlegroups.com
Hi 
I am the new one for Kaldi.
So I want to know how to run and test the example scripts in Kaldi.
Please someone tell me the good tutorial videos or website and some PDF.
Thank you.

Daniel Povey

unread,
Sep 18, 2018, 1:57:10 PM9/18/18
to John Morgan, kaldi-help
OK, I overwrote the old one-- its filename is now the same as before.

vinithaba...@gmail.com

unread,
Sep 19, 2018, 1:15:48 AM9/19/18
to kaldi-help
Hi all
          Thanks Dan.After the changes made in the link http://www.openslr.org/46/, While running run.sh local/tamsa_download.sh is running.But the links to download the  other two files qcri and subs is not specified anywhere in the code.As the links are not specified I am getting the following errors,


wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
bzcat: Can't open input file qcri.txt.bz2: No such file or directory.

Thanks in Advance


Regards
Vinitha B

John Morgan

unread,
Sep 19, 2018, 9:18:10 AM9/19/18
to kaldi...@googlegroups.com
I recently put the links in the run.sh file, so maybe you need to do a
git pull to get the latest version.

On 9/19/18, vinithaba...@gmail.com <vinithaba...@gmail.com> wrote:
> Hi all
> Thanks Dan.After the changes made in the link
> http://www.openslr.org/46/
> <http://www.google.com/url?q=http%3A%2F%2Fwww.openslr.org%2F46%2F&sa=D&sntz=1&usg=AFQjCNHJ3Nnzo9t8eApvKjU-gWg3uqOeyA>,
>
> While running run.sh local/tamsa_download.sh is running.But the links to
> download the other two files qcri and subs is not specified anywhere in
> the code.As the links are not specified I am getting the following errors,
>
>
>
>
>
>
> *wget: missing URLUsage: wget [OPTION]... [URL]...Try `wget --help' for
> more options.bzcat: Can't open input file qcri.txt.bz2: No such file or
> directory.*
>
> Thanks in Advance
>
>
> Regards
> Vinitha B
>
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/ec7b6866-a18b-467d-b50b-4e54e4d5f0d1%40googlegroups.com.

vinitha baskaran

unread,
Sep 25, 2018, 12:17:14 AM9/25/18
to kaldi...@googlegroups.com
Hi all ,
         While running the run_tdnn.sh I am getting the following errors,

run.pl: job failed, log is in exp/chain/tree_sp/log/compile_questions.log
run_tdnn.sh: creating neural net configs using the xconfig parser
tree-info exp/chain/tree_sp/tree
ERROR (tree-info[5.4.264~1-f788]:Input():kaldi-io.cc:756) Error opening input stream exp/chain/tree_sp/tree

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool*)
void kaldi::ReadKaldiObject<kaldi::ContextDependency>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, kaldi::ContextDependency*)
main
__libc_start_main
_start

steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain/tdnn1b_sp/configs/network.xconfig --config-dir exp/chain/tdnn1b_sp/configs/
***Exception caught while parsing the following xconfig line:
***   relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.03 dropout-proportion=0.0 dropout-per-dim-continuous=true dim=768

Traceback (most recent call last):
  File "steps/nnet3/xconfig_to_configs.py", line 250, in <module>
    main()
  File "steps/nnet3/xconfig_to_configs.py", line 243, in main
    all_layers = xparser.read_xconfig_file(args.xconfig_file)
  File "steps/libs/nnet3/xconfig/parser.py", line 69, in read_xconfig_file
    this_layer = xconfig_line_to_object(line, all_layers)
  File "steps/libs/nnet3/xconfig/parser.py", line 50, in xconfig_line_to_object
    raise e
RuntimeError: No such layer type 'relu-batchnorm-dropout-layer'
2018-09-24 23:00:49,278 [steps/nnet3/chain/train.py:33 - <module> - INFO ] Starting chain model trainer (train.py)
steps/nnet3/chain/train.py --stage=-10 --cmd=run.pl --feat.online-ivector-dir=exp/nnet3/ivectors_train_sp_hires --feat.cmvn-opts=--norm-means=false --norm-vars=false --chain.xent-regularize 0.1 --chain.leaky-hmm-coefficient=0.1 --chain.l2-regularize=0.0 --chain.apply-deriv-weights=false --chain.lm-opts=--num-extra-lm-states=2000 --trainer.dropout-schedule 0,0...@0.20,0...@0.50,0 --trainer.add-option=--optimization.memory-compression-level=2 --trainer.srand=0 --trainer.max-param-change=2.0 --trainer.num-epochs=8 --trainer.frames-per-iter=3000000 --trainer.optimization.num-jobs-initial=2 --trainer.optimization.num-jobs-final=5 --trainer.optimization.initial-effective-lrate=0.001 --trainer.optimization.final-effective-lrate=0.0001 --trainer.num-chunk-per-minibatch=128,64 --egs.chunk-width=140,100,160 --egs.dir= --egs.opts=--frames-overlap-per-eg 0 --cleanup.remove-egs=true --use-gpu=true --reporting.email= --feat-dir=data/train_sp_hires --tree-dir=exp/chain/tree_sp --lat-dir=exp/chain/tri3b_train_sp_lats --dir=exp/chain/tdnn1b_sp
['steps/nnet3/chain/train.py', '--stage=-10', '--cmd=run.pl', '--feat.online-ivector-dir=exp/nnet3/ivectors_train_sp_hires', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--chain.xent-regularize', '0.1', '--chain.leaky-hmm-coefficient=0.1', '--chain.l2-regularize=0.0', '--chain.apply-deriv-weights=false', '--chain.lm-opts=--num-extra-lm-states=2000', '--trainer.dropout-schedule', '0,0...@0.20,0...@0.50,0', '--trainer.add-option=--optimization.memory-compression-level=2', '--trainer.srand=0', '--trainer.max-param-change=2.0', '--trainer.num-epochs=8', '--trainer.frames-per-iter=3000000', '--trainer.optimization.num-jobs-initial=2', '--trainer.optimization.num-jobs-final=5', '--trainer.optimization.initial-effective-lrate=0.001', '--trainer.optimization.final-effective-lrate=0.0001', '--trainer.num-chunk-per-minibatch=128,64', '--egs.chunk-width=140,100,160', '--egs.dir=', '--egs.opts=--frames-overlap-per-eg 0', '--cleanup.remove-egs=true', '--use-gpu=true', '--reporting.email=', '--feat-dir=data/train_sp_hires', '--tree-dir=exp/chain/tree_sp', '--lat-dir=exp/chain/tri3b_train_sp_lats', '--dir=exp/chain/tdnn1b_sp']
usage: train.py [-h] [--feat.online-ivector-dir ONLINE_IVECTOR_DIR]
                [--feat.cmvn-opts CMVN_OPTS]
                [--egs.chunk-left-context CHUNK_LEFT_CONTEXT]
                [--egs.chunk-right-context CHUNK_RIGHT_CONTEXT]
                [--egs.chunk-left-context-initial CHUNK_LEFT_CONTEXT_INITIAL]
                [--egs.chunk-right-context-final CHUNK_RIGHT_CONTEXT_FINAL]
                [--egs.transform_dir TRANSFORM_DIR] [--egs.dir EGS_DIR]
                [--egs.stage EGS_STAGE] [--egs.opts EGS_OPTS]
                [--trainer.srand SRAND]
                [--trainer.shuffle-buffer-size SHUFFLE_BUFFER_SIZE]
                [--trainer.add-layers-period ADD_LAYERS_PERIOD]
                [--trainer.max-param-change MAX_PARAM_CHANGE]
                [--trainer.samples-per-iter SAMPLES_PER_ITER]
                [--trainer.lda.rand-prune RAND_PRUNE]
                [--trainer.lda.max-lda-jobs MAX_LDA_JOBS]
                [--trainer.presoftmax-prior-scale-power PRESOFTMAX_PRIOR_SCALE_POWER]
                [--trainer.optimization.num-jobs-initial NUM_JOBS_INITIAL]
                [--trainer.optimization.num-jobs-final NUM_JOBS_FINAL]
                [--trainer.optimization.max-models-combine MAX_MODELS_COMBINE]
                [--trainer.optimization.combine-sum-to-one-penalty COMBINE_SUM_TO_ONE_PENALTY]
                [--trainer.optimization.momentum MOMENTUM]
                [--trainer.dropout-schedule DROPOUT_SCHEDULE] [--stage STAGE]
                [--exit-stage EXIT_STAGE] [--cmd COMMAND]
                [--egs.cmd EGS_COMMAND] [--use-gpu {true,false}]
                [--cleanup {true,false}] [--cleanup.remove-egs {true,false}]
                [--cleanup.preserve-model-interval PRESERVE_MODEL_INTERVAL]
                [--reporting.email EMAIL]
                [--reporting.interval REPORTING_INTERVAL]
                [--background-polling-time BACKGROUND_POLLING_TIME]
                [--egs.chunk-width CHUNK_WIDTH] [--chain.lm-opts LM_OPTS]
                [--chain.l2-regularize L2_REGULARIZE]
                [--chain.xent-regularize XENT_REGULARIZE]
                [--chain.right-tolerance RIGHT_TOLERANCE]
                [--chain.left-tolerance LEFT_TOLERANCE]
                [--chain.leaky-hmm-coefficient LEAKY_HMM_COEFFICIENT]
                [--chain.apply-deriv-weights {true,false}]
                [--chain.frame-subsampling-factor FRAME_SUBSAMPLING_FACTOR]
                [--chain.alignment-subsampling-factor ALIGNMENT_SUBSAMPLING_FACTOR]
                [--chain.left-deriv-truncate LEFT_DERIV_TRUNCATE]
                [--trainer.num-epochs NUM_EPOCHS]
                [--trainer.frames-per-iter FRAMES_PER_ITER]
                [--trainer.num-chunk-per-minibatch NUM_CHUNK_PER_MINIBATCH]
                [--trainer.optimization.initial-effective-lrate INITIAL_EFFECTIVE_LRATE]
                [--trainer.optimization.final-effective-lrate FINAL_EFFECTIVE_LRATE]
                [--trainer.optimization.shrink-value SHRINK_VALUE]
                [--trainer.optimization.shrink-saturation-threshold SHRINK_SATURATION_THRESHOLD]
                [--trainer.deriv-truncate-margin DERIV_TRUNCATE_MARGIN]
                --feat-dir FEAT_DIR --tree-dir TREE_DIR --lat-dir LAT_DIR
                --dir DIR
train.py: error: unrecognized arguments: --trainer.add-option=--optimization.memory-compression-level=2




And the log file contain the following,
# compile-questions --leftmost-questions-truncate=-1 --context-width=2 --central-position=1 data/lang_chain/topo exp/chain/tree_sp/questions.int exp/chain/tree_sp/questions.qst
# Started at Mon Sep 24 23:00:49 CDT 2018
#

Compile questions
Usage:  compile-questions [options] <topo> <questions-text-file> <questions-out>
e.g.:
 compile-questions questions.txt questions.qst

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --central-position          : Central position in phone context window [must match acc-tree-stats] (int, default = 1)
  --context-width             : Context window size [must match acc-tree-stats]. (int, default = 3)
  --num-iters-refine          : Number of iters of refining questions at each node.  >0 --> questions not refined (int, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was: compile-questions --leftmost-questions-truncate=-1 --context-width=2 --central-position=1 data/lang_chain/topo exp/chain/tree_sp/questions.int exp/chain/tree_sp/questions.qst
ERROR (compile-questions[5.4.264~1-f788]:Read():parse-options.cc:372) Invalid option --leftmost-questions-truncate=-1

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::ParseOptions::Read(int, char const* const*)
main
__libc_start_main
_start

# Accounting: time=0 threads=1
# Ended (code 255) at Mon Sep 24 23:00:49 CDT 2018, elapsed time 0 seconds

Thanks in Advance

Regards,
Vinitha B

Daniel Povey

unread,
Sep 25, 2018, 12:51:07 AM9/25/18
to kaldi-help
I think that's a combination of running the TDNN script from a too-late stage,
plus trying to combine a new example script with an old version of Kaldi scripts and code.


vinithaba...@gmail.com

unread,
Sep 25, 2018, 4:45:46 AM9/25/18
to kaldi-help
Hi Dan,
                How to solve the issue?

Jesus_Is_Lord

unread,
Feb 2, 2019, 1:08:29 PM2/2/19
to kaldi-help

I’m building an Arabic ASR with tunisian_msa’s recipe and tried to fix some issues using minin-librispeech’recipe. While run.sh successfully generates files in data/lang_test/tmp:
CLG_1_0.fst
CLG_3_1.fst
LG.fst
disambig_ilabels_1_0.int
disambig_ilabels_3_1.int
ilabels_1_0
ilabels_3_1

, sudo ./local/chain/run_tdnn.sh --stage 15, complains about the non-existing file ilabels_2_0 during a graph construction. I’m trying to understand why the run.sh script JUMPS  creating ilabels_2_0 while it created ilabels_3_0 and ilabels_1_0, in the mean time do you’ve any idea?

tree-info exp/chain/tree_sp/tree
tree-info exp/chain/tree_sp/tree
fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang_test/phones/disambig.int --write-disambig-syms=data/lang_test/tmp/disambig_ilabels_2_1.int data/lang_test/tmp/ilabels_2_1.18313 data/lang_test/tmp/LG.fst

mv: cannot stat 'data/lang_test/tmp/CLG_2_1.fst.18313': No such file or directory
mv: cannot stat 'data/lang_test/tmp/ilabels_2_1.18313': No such file or directory
fstisstochastic data/lang_test/tmp/CLG_2_1.fst
ERROR: FstHeader::Read: Bad FST header: data/lang_test/tmp/CLG_2_1.fst


thanks indvance!

Daniel Povey

unread,
Feb 2, 2019, 1:10:56 PM2/2/19
to kaldi-help
The relu-batchnorm-dropout-layer thing is likely due to using older version of Kaldi repository with newer example scripts.
The 'mv' thing, I'm not sure.
It's odd that you are using 'sudo' to run that stuff.  Might be related.

Jesus_Is_Lord

unread,
Feb 12, 2019, 10:23:27 PM2/12/19
to kaldi-help
How do you set the number of hidden layers to be transferred in s5/local/chain/tuning/run_tdnn_wsj_rm_1a.sh?

thanks in advance!

Daniel Povey

unread,
Feb 12, 2019, 10:24:15 PM2/12/19
to kaldi-help
You can't set it, it transfers all of them (it's going to retrain them anyway).  But it recreates the output nodes, they may have a different size than before.

Daniel Povey

unread,
Feb 12, 2019, 10:25:36 PM2/12/19
to kaldi-help
... but that tdnn6.renorm in the script is something that you would have to adapt to the specific model, you'd have to figure out the name of the last hidden layer and the last corresponding node in the nnet config, which might be something like tdnnf11.noop (use nnet3-info to see it).

Jesus_Is_Lord

unread,
Feb 13, 2019, 2:26:24 PM2/13/19
to kaldi-help
Ok, thanks!

Do you've any idea how it was managed (to vary the number of hidden layers) in http://www.danielpovey.com/files/2017_asru_transfer_learning.pdf ?

Daniel Povey

unread,
Feb 13, 2019, 2:27:50 PM2/13/19
to kaldi-help
It was probably done by setting the learning rates per layer.  You can use the --edits command to nnet3-am-copy; search in the code for set-learning-rate, using git grep, you will see how it's done.  You can double check with nnet3-info that it worked.

Dan

Reply all
Reply to author
Forward
0 new messages