HiI have a text has about 500M words about 5G size in utf-8 encoding.for training a language model, prune language model or reduce text corpus?
Also, my lexicon is too small about 20k words. What do you suggest to extend lexicon in Pashto language (?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8364ab3d-47d6-46a9-94fd-c22de1850de8%40googlegroups.com.
I'd suggest to just use a grapheme-based lexicon so you don't have to rely on human annotations.
I'd suggest to just use a grapheme-based lexicon so you don't have to rely on human annotations.Any Example of grapheme-based model? (wsj/s5/local/chain/e2e/run_tdnnf_flatstart_char.sh)
so I think these model a bit worse than phone models?
What size of lexicon you suggest good for these models?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8dcb3617-8142-4ca3-86e3-10f8171d50f2%40googlegroups.com.
Or look at gale_arabic/s5 which I think is grapheme-based; s5b is BPE word-piece model where the phonetic units are still graphemes.
Maxent 3grams-------------------terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/afdffc42-6ad8-4c70-87d1-4c1729bbca30%40googlegroups.com.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3431d23d-6547-4980-8814-0735e54bdd9a%40googlegroups.com.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3431d23d-6547-4980-8814-0735e54bdd9a%40googlegroups.com.