Hello.
I'm trying to create an lm-model based on a data slice from Google Ngram Exports (https://storage.googleapis.com/books/ngrams/books/datasetsv3.html). However, I ran into the problem that lm-models in Kaldi require a lexicon.txt file, which will contain the full list of available words. Google uses "tags" that indicate the part of speech of a word and allows you to reduce the size of the final model. Thus, my question is - is there any way to represent data in this way for training an lm-model in Kaldi?--
It is also interesting how the search for word forms is implemented in Google Ngrams: for example, if you enter the query "run_INF", it will find the word forms "run", "ran", "running", "runs".
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e45264ff-fc61-4685-8c12-f4fff3260db3n%40googlegroups.com.
Hello.
I'm trying to create an lm-model based on a data slice from Google Ngram Exports (https://storage.googleapis.com/books/ngrams/books/datasetsv3.html). However, I ran into the problem that lm-models in Kaldi require a lexicon.txt file, which will contain the full list of available words. Google uses "tags" that indicate the part of speech of a word and allows you to reduce the size of the final model. Thus, my question is - is there any way to represent data in this way for training an lm-model in Kaldi?--
It is also interesting how the search for word forms is implemented in Google Ngrams: for example, if you enter the query "run_INF", it will find the word forms "run", "ran", "running", "runs".
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/1HMRGA6oAHQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.