Have some trouble with Training G2P.

BAI YE

unread,

Mar 17, 2016, 5:02:54 AM3/17/16

to kaldi-help

Hi, everyone!

When I was trying to Train g2p model, I have some trouble. The Log provided this message:

Traceback (most recent call last):
  File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 662, in run
    shouldStop = self.iterate(context)
  File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 575, in iterate
    self.shallUseMaximumApproximation)
  File "/data1/baiye/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 260, in evidence
    for eg in self.graphs(model):
  File "/data1/baiye/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 202, in makeGraphs
    eg = self.builder.create(left, right)
  File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur_.py", line 145, in create
    def create(self, *args): return _sequitur_.EstimationGraphBuilder_create(self, *args)
ValueError: symbol out of range: 256
iteration failed.
failed to estimate or load model
# Accounting: time=72 threads=1
# Ended (code 1) at Thu Mar 17 23:58:36 CST 2016, elapsed time 72 seconds

What does it mean by "symbol out of range"?

Thank you!

Jan Trmal

unread,

Mar 17, 2016, 5:19:43 AM3/17/16

to kaldi-help

No idea. How many phonemes you have and how many words? Is this on some standard kaldi eg?
Y.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

BAI YE

unread,

Mar 17, 2016, 5:38:21 AM3/17/16

to kaldi-help

Thank you for your answer!

It is not standard Kaldi egs. And I will exam phonemes and words.

在 2016年3月17日星期四 UTC+8下午5:19:43，Yenda写道：

Jan Trmal

unread,

Mar 17, 2016, 6:00:02 AM3/17/16

to kaldi-help

Thanks. My feeling is that you have either too many phonemes or too many words in the training lexicon.
There is a compile-time variable in sequitur that allows for changing of sizes of the static structures in sequitur, but I think we set it to a sufficient size if you use the script extras/install_sequitur.sh
I will have to check this later, I'm not at the computer right now.
Y.

Jan Trmal

unread,

Mar 17, 2016, 7:41:26 AM3/17/16

to kaldi-help

I've just commited a fix for this. Please go to your kaldi tools and delete the sequitur-g2p directory.

The install again by calling extras/install_sequitur.sh

It was indeed the MULTIGRAM_SIZE, which was set to 2. That is generally OK for alphabetical scripts but I assume you are trying either ideographic/logographic script or doing something special.

y.

BAI YE

unread,

Mar 17, 2016, 8:19:36 AM3/17/16

to kaldi-help

Thank you very much!

I installed sequitur again, and the problem has been solved. And my experiment is running now.

在 2016年3月17日星期四 UTC+8下午7:41:26，Yenda写道：

Message has been deleted

BAI YE

unread,

Mar 17, 2016, 10:06:12 PM3/17/16

to kaldi-help

Hi Yenda,

Today when I examined g2p training log, I found a run time error. The log provided this message:

Estimation.cc:296 63.7685 63.7685 63.7685 53.8788 0.508154 9.38157 -2.84217e-14

Traceback (most recent call last): File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 662, in run shouldStop = self.iterate(context)

File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 595, in iterate newModel.sequenceModel = self.sequenceModel(evidence, newModel.discount) File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 512, in sequenceModel evidence = evidence.makeSequenceModelEstimator() File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur_.py", line 189, in makeSequenceModelEstimator def makeSequenceModelEstimator(self): return _sequitur_.EvidenceStore_makeSequenceModelEstimator(self) RuntimeError: std::bad_alloc memory usage: virtual 435.9 MB resident 321.6 MB

At the end of the log, the message is :

optimal discount: [ 0.51929295  0.69953466  0.81342148  0.89166478]
max. rel. change: 1.0

iteration failed.
failed to estimate or load model

# Accounting: time=396 threads=1
# Ended (code 1) at Thu Mar 17 23:00:50 CST 2016, elapsed time 396 seconds

I think memory allocation was failed. How can I solve it?

Thank you!

BAI

在 2016年3月17日星期四 UTC+8下午7:41:26，Yenda写道：

I've just commited a fix for this. Please go to your kaldi tools and delete the sequitur-g2p directory.

Daniel Povey

unread,

Mar 17, 2016, 10:08:40 PM3/17/16

to kaldi-help

Run on a bigger machine!

Or run fewer iterations of model re-estimation in the script (I think it adds a new n-gram order each time).
Dan

On Thu, Mar 17, 2016 at 9:57 PM, BAI YE <by2...@gmail.com> wrote:

Hi Yenda,
Today when I examined g2p training log, I found a run time error. The log provided this message:

Estimation.cc:296 63.7685 63.7685 63.7685 53.8788 0.508154 9.38157 -2.84217e-14

Traceback (most recent call last): File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 662, in run shouldStop = self.iterate(context)

File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 595, in iterate newModel.sequenceModel = self.sequenceModel(evidence, newModel.discount) File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur.py", line 512, in sequenceModel evidence = evidence.makeSequenceModelEstimator() File "/data1/KeywordSearch/kaldi-trunk/tools/sequitur-g2p/sequitur_.py", line 189, in makeSequenceModelEstimator def makeSequenceModelEstimator(self): return _sequitur_.EvidenceStore_makeSequenceModelEstimator(self) RuntimeError: std::bad_alloc memory usage: virtual 435.9 MB resident 321.6 MB

I think memory allocation was failed. How can I solve it?
Thank you!

BAI

在 2016年3月17日星期四 UTC+8下午7:41:26，Yenda写道：

I've just commited a fix for this. Please go to your kaldi tools and delete the sequitur-g2p directory.

在 2016年3月17日星期四 UTC+8下午7:41:26，Yenda写道：

I've just commited a fix for this. Please go to your kaldi tools and delete the sequitur-g2p directory.

在 2016年3月17日星期四 UTC+8下午7:41:26，Yenda写道：

I've just commited a fix for this. Please go to your kaldi tools and delete the sequitur-g2p directory.

BAI YE

unread,

Mar 18, 2016, 1:47:14 AM3/18/16

to kaldi-help, dpo...@gmail.com

Thank you, Dan. I am trying to reduce the number of iterations.

在 2016年3月18日星期五 UTC+8上午10:08:40，Dan Povey写道：

Jan Trmal

unread,

Mar 18, 2016, 11:08:35 AM3/18/16

to kaldi-help, Dan Povey

Is your python really 64bit? Anyway, as Dan said, most probably this is not an issue with sequitur itself and you didn't provide any details that might be used to argue for the opposite.

y.

BAI YE

unread,

Mar 21, 2016, 12:46:22 AM3/21/16

to kaldi-help, dpo...@gmail.com

Hi, Yenda,

I just examined my computer.My python really is 64bit.

I ran my experiment last week with fewer iterations. It was completed successfully.

Maybe the number of iterations should be smaller?

Thank you!

在 2016年3月18日星期五 UTC+8下午11:08:35，Yenda写道：

Jan Trmal

unread,

Mar 21, 2016, 8:35:48 AM3/21/16

to kaldi-help, Dan Povey

i'm sorry, you still didn't provide any information that would allow me to draw any conclusion.

y.

BAI YE

unread,

Mar 21, 2016, 9:44:31 PM3/21/16

to kaldi-help, dpo...@gmail.com

I trained my g2p model with 5 iterations, and it was completed successfully. I didn't reproduce the bug.

Now I'm trying to train another model with 8 iterations. I will tell you if I reproduce the bug.