ResourceExhaustedError

41 views

Skip to first unread message

Mohammad Mumin

unread,

Jul 13, 2019, 10:51:47 AM7/13/19

to Nematus Support

Dear Sir,

In May 2017, I easily installed Nematus with Theano on Ubuntu 16.04LTS.

I ran my model so easily. All of a sudden in March 2019, the Nematus shows problem with pygpu error.

Then, I planned to switch into Nematus with Tensorflow on Ubuntu 18.04LTS with fresh start.

I struggle to install locally. Therefore, I use docker image. But, I am not successful yet.

Recently, I gave the following command and find error copied in the attached file:

root@fe8ceffb9f72:/playground/nematus# CUDA_VISIBLE_DEVICES=0 python3 nematus/train.py \
    --model /playground/nmt-en-bn/model/model \
    --source_dataset /playground/corpus/training/training.clean.en \
    --target_dataset /playground/corpus/training/training.clean.bn \
    --valid_source_dataset /playground/corpus/dev/dev.clean.en \
    --valid_target_dataset /playground/corpus/dev/dev.clean.bn \
    --dictionaries /playground/corpus/training/training.clean.en.json /playground/corpus/training/training.clean.bn.json \
    --valid_script /playground/nmt-en-bn/script/validate.sh \
    --embedding_size 512 \
    --state_size 1024 \
    --source_vocab_sizes 33386 \
    --target_vocab_size 46952 \
    --maxlen 60 \
    --valid_batch_size 40 \
    --saveFreq 10000 \
    --rnn_use_dropout \
    --patience 40 \
    --tie_decoder_embeddings \
    --rnn_layer_normalisation \
    --rnn_enc_depth 4 \
    --rnn_dec_depth 4

I seek your kind help in this regard.

Should I continue with Tensorflow or switch back to Theano. I am in hesitation with repeated failure with Tensorflow.

Thanks in advance..

error

Rico Sennrich

unread,

Jul 15, 2019, 12:15:50 PM7/15/19

to nematus...@googlegroups.com

Hello Mohammad,

I suggest that you continue with the Tensorflow version, since this is better supported.

It looks like your training command exceeds the memory available on your GPU. It is possible that the Tensorflow implementation requires slightly more memory than the Theano implementation. The good news is that Nematus now has more options to deal with memory problems:

- you can spread your batch over multiple GPUs: https://github.com/EdinburghNLP/nematus/blob/master/doc/multi_gpu_training.md
- you can now also define batch size in terms of number of tokens (--token_batch_size), which makes better use of space: the number of sentences in a batch is scaled to be inversely proportional to sentence length.
- you can reduce your batch size, but this may negatively affect quality.
- to get the benefits of smaller minibatches without quality loss, you can use the argument "--max_sentences_per_device" (or "--max_tokens_per_device"), and set this to a value that is small enough to run on your device. The trainer will then do error backpropagation on small minibatches, and aggregate the gradients until "--batch_size" sentences (or "--token_batch_size" tokens) have been processed before doing an update. This corresponds to training with a larger minibatch, but trades off training speed for space.

best wishes,
Rico

--
You received this message because you are subscribed to the Google Groups "Nematus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nematus-suppo...@googlegroups.com.
To post to this group, send email to nematus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nematus-support/1d17a058-7328-4317-a286-7d297a5f67a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages