What is the fastest way to adapt chain model to a person

285 views
Skip to first unread message

Arkadi

unread,
Jan 12, 2020, 5:13:33 AM1/12/20
to kaldi-help
Hi Dan, all


I read a lot about adaptation issues related chain models.
I know the best way is to merge the adapt-data with the original and train from scratch. Also I know about the transfer learning experience done with wsj-rm. Both methods cost too much time,

My task is to adapt chain model to a speaker with about 45 min of recordings, what is the fastest way to do it ?
My data-set consist of ~300 hours of recordings and I used swbd recipe to train the dnn.

Arkadi

Daniel Povey

unread,
Jan 12, 2020, 5:54:28 AM1/12/20
to kaldi-help
The plan to do this kind of thing easily is to build it on top of this PR
by adding a script there (if it doesn't already exist) that can combine two existing egs dirs.
(In those scripts, the egs are ingested as scripts, so they are easier to combine).
If you could work on that, it would be great.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/11692e00-042f-4fa6-acfd-249786ee860f%40googlegroups.com.

Arkadi Gurevich

unread,
Jan 13, 2020, 3:21:53 AM1/13/20
to kaldi...@googlegroups.com
I am not familiar with the chain2 project yet. Can you give me a reference to read about it?
Then I would be happy to work on it

Arkadi

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/4kQcKGEd4aI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTy1nzHC0gfM0vF-WkSpW85cOe-zGzT6MFuMtRyRKAoQw%40mail.gmail.com.

Kirill Katsnelson

unread,
Jan 13, 2020, 10:15:10 PM1/13/20
to kaldi-help
On Monday, January 13, 2020 at 12:21:53 AM UTC-8, Arkadi wrote:
I am not familiar with the chain2 project yet. Can you give me a reference to read about it?
Then I would be happy to work on it

The PR is probably the best reference there is.

 -kkm

Arkadi

unread,
Feb 16, 2021, 8:18:10 AM2/16/21
to kaldi-help
Hi Dan,

I trained an acoustic model based on chain2 scripts with data-set A. Now I  want to retrain with additional data-set B.
I need to do the following steps ( correct me if I'm wrong please )
1. Extract feats to data-set B
2. Align lattices to data-set B using gmm model trained by data-set A
3. Get raw egs of data-set B
4. Combine raw egs of data-set B with data-set A ( I'm not sure if there is a script for it )
5. Continue  steps 18 and on from mini_librispeech/s5/local/chain2/tuning/run_tdnn_1a.sh

Did I miss something ?
Thank you



On Sunday, January 12, 2020 at 2:54:28 AM UTC-8, Dan Povey wrote:
The plan to do this kind of thing easily is to build it on top of this PR
by adding a script there (if it doesn't already exist) that can combine two existing egs dirs.
(In those scripts, the egs are ingested as scripts, so they are easier to combine).
If you could work on that, it would be great.

Dan

On Sun, Jan 12, 2020 at 6:13 PM Arkadi <ark...@onvego.com> wrote:
Hi Dan, all


I read a lot about adaptation issues related chain models.
I know the best way is to merge the adapt-data with the original and train from scratch. Also I know about the transfer learning experience done with wsj-rm. Both methods cost too much time,

My task is to adapt chain model to a speaker with about 45 min of recordings, what is the fastest way to do it ?
My data-set consist of ~300 hours of recordings and I used swbd recipe to train the dnn.

Arkadi

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Feb 16, 2021, 8:37:48 AM2/16/21
to kaldi-help, Srikanth R Madikeri
In principle it should be easier to combine different dumped egs with chain2 than chain, because the randomization is done at the scp-file level.
However, it looks to me like the script steps/chain2/combine_egs.sh is intended for the multilingual case and is not intended for one language.

I think we could create a combine_raw_egs.sh script which would combine the raw_egs directories to an output raw_egs directory.
Srikanth, might you have time to do this?

From looking at the current scripts, I think we just need to do as follows:

  raw_egs_dir/all.scp  [append]
  raw_egs_dir/misc/utt2spk [append and sort]
  raw_egs_dir/misc/utt2uniq [append and sort]
  raw_egs_dir/misc/*.tree  [make sure the filenames are the same and the files are identical; copy]
  raw_egs_dir/misc/*.train_mdl  [make sure the filenames are the same and the files are identical; copy]

and for raw_egs_dir/info.txt: some fields should be checked that they are identical, and some should be added or weighted-averaged; which is which should be obvious.


Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/9dff3eb6-b7da-4d4f-acaf-94f03fbba920o%40googlegroups.com.

Srikanth R Madikeri

unread,
Feb 16, 2021, 10:06:21 AM2/16/21
to kaldi...@googlegroups.com
Hi Dan,

Yes, I can look into this. It seems simple to implement.

Srikanth

Reply all
Reply to author
Forward
0 new messages