--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6ee2e8f4-b669-4a07-95a9-a582ef9cc00b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/289a43aa-77c3-480a-afb5-17ef17c2925c%40googlegroups.com.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b8f8efef-c0c8-461f-8e01-0fe556447cf7%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5af4481c-ee56-49ff-a5a3-95ffb306ac3e%40googlegroups.com.
I was also thinking of concatenating training utterances. However, treating as an augmentation strategy by including both singular and merged forms, as you suggest, would be better. Will set this running before the weekend. Thanks.
I was also thinking of concatenating training utterances. However, treating as an augmentation strategy by including both singular and merged forms, as you suggest, would be better. Will set this running before the weekend. Thanks.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5baa4fa1-5ff9-4a81-b5e8-1ae2c5716d2a%40googlegroups.com.
One option you have when contcatenating utterances is instead of just concatenating like:hello whats your namemy name's davetohello whats your name my name's davedo instead:hello whats your name </s> my name's daveThat is, separated by EOS characters. The option to do this was part of the plan from the start but it was never actually implemented before (there might still be bugs). IIRC I made sure that the validation script would accept this type of data. The idea is that it will predict the EOS, and the model itself can learn that, when seen as history, EOS basically behaves like a BOS character except that the preceding context should not be ignored as it's part of the same stream of text.
Dan
On Tue, Feb 20, 2018 at 6:29 PM, David <henria...@gmail.com> wrote:
I was also thinking of concatenating training utterances. However, treating as an augmentation strategy by including both singular and merged forms, as you suggest, would be better. Will set this running before the weekend. Thanks.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/fd43e643-a84e-4f2b-98a4-e71fde1621b0%40googlegroups.com.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/9b8aa3e4-9cdb-47e8-8544-2e5a3b3eb86a%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/fd43e643-a84e-4f2b-98a4-e71fde1621b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Out of academic interest, has anyone used this rnnlm (or others with letter features) to deal with capitalised training texts. Three approaches I can think of are.1) Treating say 'b' and 'B' independently is the obvious approach. Thus buffalo and Buffalo would share a commonality of many features, but not all, though we effectively double the number of letters which has a cost.2) Ignoring case altogether and relying on the underlying ngram to determine case. Both don't necessarily need capitalisation and it could improve training robustness and maybe improve orthogonality between two.3) Mapping letter ngram features to lower case only and introducing a new capitalization feature (or features), say num-prefix-capitals which would be 0 for 'buffalo', 1 for Buffalo or 4 for NATO. Thought this might make the feature space more concise and help robustness to real world LM training data (as opposed to Corpus transcriptions) where capitalisation is inconsistently applied.Would be interested if anyone has compared these approaches or indeed has tried anything like the latter.
I've been running more experiments in the background regarding the instability issue above.Firstly, as over-fitting looked to be an issue, I ran with the same model schema but padded out with ten times more data. Disappointingly this did not resolve the instability and performance again deteriorated after a few tens of words.
Secondly, I tried converting the text to be purely lower case. The resulting rnnlm models performed as well as I originally expected and were perfectly stable -- performance improved (until convergence) as left context increases. This was true using Fisher data alone as well as the larger padded training set. My model builds still included parameter settings from previous attempts to remedy the problem; i.e. including an aggressive decay time of 5 and chunk length of 64. Running the smaller model build without the decay time restriction surprisingly (to me) made performance a little worse, though not unstable.I'm not sure why capitalization proved problematic for me. I'd guess its due to a doubled letter-space sparsity issue rather than something intrinsic to capitalization in the code. Ideally I would have experimented with L2 regularization to keep the weights in check but I haven't the time and resources at the moment to determine sensible values for the hyper-parameters.
Out of academic interest, has anyone used this rnnlm (or others with letter features) to deal with capitalised training texts. Three approaches I can think of are.1) Treating say 'b' and 'B' independently is the obvious approach. Thus buffalo and Buffalo would share a commonality of many features, but not all, though we effectively double the number of letters which has a cost.2) Ignoring case altogether and relying on the underlying ngram to determine case. Both don't necessarily need capitalisation and it could improve training robustness and maybe improve orthogonality between two.3) Mapping letter ngram features to lower case only and introducing a new capitalization feature (or features), say num-prefix-capitals which would be 0 for 'buffalo', 1 for Buffalo or 4 for NATO. Thought this might make the feature space more concise and help robustness to real world LM training data (as opposed to Corpus transcriptions) where capitalisation is inconsistently applied.Would be interested if anyone has compared these approaches or indeed has tried anything like the latter.David Pye
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4310ebd1-359b-46b4-ba57-14ca183dd928%40googlegroups.com.