Transfer learning output dim mismatch

485 views
Skip to first unread message

Ravikiran

unread,
Mar 29, 2019, 6:15:50 AM3/29/19
to kaldi-help
I have trained a nnet3 model on Common voice us accent data (~100 hours of audio) and trying to apply transfer learning on Indian accent data (~10 hours). 
I have followed Librispeech recipe for the main model (the same TDNN architecture. config attached) and for transfer learning, I am following a recipe similar to wsj-rm-1c (train the last layer on target dataset  with higher learning rate and other layers with a smaller learning rate)

I get the following error. Attached the full log file

ASSERTION_FAILED (nnet3-chain-train[5.5.235~2-8cbd5]:GenericNumeratorComputation():chain-generic-numerator.cc:43) Assertion failed: (supervision.num_sequences * supervision.frames_per_sequence == nnet_output.NumRows() && supervision.label_dim == nnet_output.NumCols())


The egs/info/num_pdfs value in my source model directory is 5344 which is the same as the output dimension of the network (please refer to the config attached). However, for my target data, egs/info/num_pdfs value seems to be 2088. How do I ensure that the num_pdfs in my target data are the same as my source data?

I followed the same procedure to align my target dataset, train GMM/HMM model(LDA+MLLT+SAT) and align. I even tried to align my target dataset with the the model of the source and the num_pdfs still are not the same as 5344


xconfig
train.0.1.log

Vimal Manohar

unread,
Mar 29, 2019, 10:29:29 AM3/29/19
to kaldi-help
You might be rebuilding the tree somewhere. You have to *reuse* the same source model's tree. If you followed the steps correctly in https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1b.sh (using GMM alignments) or https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1c.sh (using nnet3 alignments), you will not have the problem.

VImal

Email from People at capillarytech.com may not represent Official Policy of Capillary Technologies unless explicitly stated. Please see our Privacy Policy for details. Contents of this Email are confidential. Please contact Sender if you have received this Email in error.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b63cbc02-9ac1-4717-95cc-c1701b57d2e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Vimal Manohar
PhD Student
Center for Language and Speech Processing
Johns Hopkins University
Baltimore, MD

Daniel Povey

unread,
Mar 29, 2019, 11:46:35 AM3/29/19
to kaldi-help
Also, I recommend just combining both data sources and training a single model, instead of doing transfer learning.

Mike Kim

unread,
May 12, 2020, 11:04:16 PM5/12/20
to kaldi-help
if you have 5000 hours of native speakers and 10 hours of non-native speakers, and your target user is non-native speakers
which would be better, combing both data sources and training a single model, or doing transfer learning on the target user data


On Friday, March 29, 2019 at 11:46:35 PM UTC+8, Dan Povey wrote:
Also, I recommend just combining both data sources and training a single model, instead of doing transfer learning.

On Fri, Mar 29, 2019 at 10:29 AM Vimal Manohar <vimal.m...@gmail.com> wrote:
You might be rebuilding the tree somewhere. You have to *reuse* the same source model's tree. If you followed the steps correctly in https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1b.sh (using GMM alignments) or https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1c.sh (using nnet3 alignments), you will not have the problem.

VImal

On Fri, Mar 29, 2019 at 6:15 AM 'Ravikiran' via kaldi-help <kaldi...@googlegroups.com> wrote:
I have trained a nnet3 model on Common voice us accent data (~100 hours of audio) and trying to apply transfer learning on Indian accent data (~10 hours). 
I have followed Librispeech recipe for the main model (the same TDNN architecture. config attached) and for transfer learning, I am following a recipe similar to wsj-rm-1c (train the last layer on target dataset  with higher learning rate and other layers with a smaller learning rate)

I get the following error. Attached the full log file

ASSERTION_FAILED (nnet3-chain-train[5.5.235~2-8cbd5]:GenericNumeratorComputation():chain-generic-numerator.cc:43) Assertion failed: (supervision.num_sequences * supervision.frames_per_sequence == nnet_output.NumRows() && supervision.label_dim == nnet_output.NumCols())


The egs/info/num_pdfs value in my source model directory is 5344 which is the same as the output dimension of the network (please refer to the config attached). However, for my target data, egs/info/num_pdfs value seems to be 2088. How do I ensure that the num_pdfs in my target data are the same as my source data?

I followed the same procedure to align my target dataset, train GMM/HMM model(LDA+MLLT+SAT) and align. I even tried to align my target dataset with the the model of the source and the num_pdfs still are not the same as 5344



Email from People at capillarytech.com may not represent Official Policy of Capillary Technologies unless explicitly stated. Please see our Privacy Policy for details. Contents of this Email are confidential. Please contact Sender if you have received this Email in error.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b63cbc02-9ac1-4717-95cc-c1701b57d2e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Vimal Manohar
PhD Student
Center for Language and Speech Processing
Johns Hopkins University
Baltimore, MD

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
May 12, 2020, 11:32:20 PM5/12/20
to kaldi-help
I'd duplicate the non-native speakers some number of times (5? 10?) and combine it with the native.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c45c1bf9-076a-46a6-88cb-7866e2b855b9%40googlegroups.com.

Mike Kim

unread,
May 15, 2020, 4:45:06 AM5/15/20
to kaldi-help
Is there a command to copy the data and create new uttid because the combine data will just remove the duplicates 
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c45c1bf9-076a-46a6-88cb-7866e2b855b9%40googlegroups.com.

Mike Kim

unread,
May 15, 2020, 4:51:42 AM5/15/20
to kaldi-help
found one that works
./utils/copy_data_dir.sh --spk-prefix 2- --ut-prefix 2- data/train data/train_2
Reply all
Reply to author
Forward
0 new messages