with Tensor2tensor when making a transformer training.......

45 views
Skip to first unread message

陳裕政

unread,
Sep 4, 2019, 9:21:30 PM9/4/19
to tensor2tensor
Dear All:

I make transformer project with Tensor2tensor.
When I make it training, I got the fowlling message (seem to error ?).

2019-08-09 07:15:08.760886: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 1 of 1024
2019-08-09 07:15:08.761055: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 2 of 1024
2019-08-09 07:15:08.761087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 3 of 1024
2019-08-09 07:15:11.678924: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 27 of 1024
2019-08-09 07:15:25.788535: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 87 of 1024
2019-08-09 07:15:30.636249: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 119 of 1024
2019-08-09 07:15:41.116180: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 517 of 1024
2019-08-09 07:15:53.008911: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 693 of 1024
2019-08-09 07:16:00.311780: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 725 of 1024

What mean is it ?
How to make to fix ?
But it can be to continued with restart the traning.

Eugene Kuznetsov

unread,
Sep 5, 2019, 10:47:24 PM9/5/19
to tensor2tensor
This is not an error. It is normal behavior. Just wait for it to finish.


陳裕政

unread,
Sep 6, 2019, 4:02:04 AM9/6/19
to tensor2tensor
FIrstly, thank you for your answer.

is it not an error ?
But, after these messages showing, the trraining process is interrupted !!
How to let the training to do not be interrupted ?



Eugene Kuznetsov於 2019年9月6日星期五 UTC+8上午10時47分24秒寫道:

Eugene Kuznetsov

unread,
Sep 6, 2019, 7:26:46 PM9/6/19
to tensor2tensor
It is impossible to say without more information about your project.

At this point, it has not started training yet. Normal output looks like so

2019-09-06 16:18:46.366759: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 114 of 512
2019-09-06 16:18:56.352787: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 240 of 512
2019-09-06 16:19:06.360418: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 366 of 512
2019-09-06 16:19:16.582945: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 503 of 512
2019-09-06 16:19:17.137535: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.
I0906 16:19:17.733114 140166232520512 basic_session_run_hooks.py:262] loss = 10.384769, step = 0

One possibility is that you have a corrupted dataset, you can try to regenerate it using t2t-datagen.

陳裕政

unread,
Oct 1, 2019, 3:45:20 AM10/1/19
to tensor2tensor
I think the dateset is fine. Because, the training could be continued.

The following is including lines before these message:

INFO:tensorflow:global_step/sec: 4.93983
INFO:tensorflow:loss = 0.2815519, step = 397470 (2.024 sec)
INFO:tensorflow:global_step/sec: 5.1179
INFO:tensorflow:loss = 0.25325972, step = 397480 (1.956 sec)
INFO:tensorflow:global_step/sec: 5.05863
INFO:tensorflow:loss = 0.26533574, step = 397490 (1.975 sec)
2019-08-09 07:15:08.760886: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 1 of 1024
2019-08-09 07:15:08.761055: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 2 of 1024
2019-08-09 07:15:08.761087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 3 of 1024
2019-08-09 07:15:11.678924: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 27 of 1024
2019-08-09 07:15:25.788535: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 87 of 1024
2019-08-09 07:15:30.636249: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 119 of 1024
2019-08-09 07:15:41.116180: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 517 of 1024
2019-08-09 07:15:53.008911: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 693 of 1024
2019-08-09 07:16:00.311780: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 725 of 1024

Actuly, before the "Filling up shuffle buffe" messages, the training have been to steps 397,490 !!
Reply all
Reply to author
Forward
0 new messages