UnicodeDecodeError in build

Mohammad Mumin

unread,

Jul 25, 2019, 3:11:54 AM7/25/19

to Nematus Support

Dear Sir,

I am facing the following problem when to build up dictionary:

root@3af2692e7098:/playground/nematus/data# python3 build_dictionary.py /playground/corpus/training/training.clean.en
Processing /playground/corpus/training/training.clean.en
Traceback (most recent call last):
File "build_dictionary.py", line 45, in <module>
    main()
File "build_dictionary.py", line 16, in main
    for line in f:
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 1551: ordinal not in range(128)

I seek your kind assistance.

Thanks in advance.

Philip Williams

unread,

Jul 26, 2019, 4:45:59 AM7/26/19

to Mohammad Mumin, Nematus Support

Hi Mohammad,

I've just committed a change that should fix this. The updated code assumes that your training files are encoded in UTF-8. If they are not then you will have to convert them before running build_dictionary.py.

Best wishes,

Phil

--
You received this message because you are subscribed to the Google Groups "Nematus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nematus-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nematus-support/0d2d9a56-2d2d-47b5-afe3-1662d2a246fb%40googlegroups.com.

Mohammad Mumin

unread,

Jul 27, 2019, 6:32:07 AM7/27/19

to Nematus Support

Thank you very much Sir,

Now it's working by your fixation.

But, I still facing the previous problem of "ValueError: too many values to unpack (expected 3)":

2019-07-27 16:19:12.423100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 10093 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO: Building model...
/home/mumin-cse/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO: Initializing model parameters from scratch...
INFO: Done
INFO: Reading data...
INFO: Done
INFO: Initial uidx=0
INFO: Starting epoch 0

Traceback (most recent call last):

File "/home/mumin-cse/nematus//nematus/train.py", line 454, in <module>
    train(config, sess)
File "/home/mumin-cse/nematus//nematus/train.py", line 177, in train
    write_summary_for_this_batch)
File "/home/mumin-cse/nematus/nematus/model_updater.py", line 112, in update
    global_step, apply_grads, mean_loss_per_sent = session.run(fetches)

ValueError: too many values to unpack (expected 3)

Sir, do you have any clue to this error?

I seek your kind assistance.

Thanks in advance.

Philip Williams

unread,

Jul 28, 2019, 6:04:56 AM7/28/19

to Mohammad Mumin, Nematus Support

Hi Mohammad,

could you try running ./test_train.sh in the nematus/test directory? Do you see the same error?

Best wishes,

Phil

--
You received this message because you are subscribed to the Google Groups "Nematus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nematus-suppo...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/nematus-support/5f7d4d1a-d241-45bc-bc0d-54991454d79d%40googlegroups.com.

Mohammad Mumin

unread,

Jul 30, 2019, 4:08:39 AM7/30/19

to Nematus Support

I ran ./test_train.sh in the nematus/test directory. I didn't see any error.

One observation is: when "ValueError: too many values to unpack (expected 3)" occurs, it produce "events.out.tfevents.1564471655.shu-anubad" file at the beginning of the execution. This file is attached.

My "train.sh" file also been attached.

I noted down the error again here:

INFO: Building model...
/home/mumin-cse/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO: Initializing model parameters from scratch...
INFO: Done
INFO: Reading data...
INFO: Done
INFO: Initial uidx=0
INFO: Starting epoch 0

Traceback (most recent call last):

File "/home/mumin-cse/nematus//nematus/train.py", line 454, in <module>
    train(config, sess)
File "/home/mumin-cse/nematus//nematus/train.py", line 177, in train
    write_summary_for_this_batch)
File "/home/mumin-cse/nematus/nematus/model_updater.py", line 112, in update
    global_step, apply_grads, mean_loss_per_sent = session.run(fetches)
ValueError: too many values to unpack (expected 3)

Thanks for your cooperation, Sir.

I am with Nematus from its legacy theano version. Nematus is best.

But, when I planned to switch in TensorFlow version, I have got stuck down at this point.

I seek your kind assistance.

Thanks in advance.

On Thursday, July 25, 2019 at 1:11:54 PM UTC+6, Mohammad Mumin wrote:

events.out.tfevents.1564471655.shu-anubad

train_bpe.sh

Philip Williams

unread,

Jul 30, 2019, 4:48:56 AM7/30/19

to Mohammad Mumin, Nematus Support

Hi Mohammad,

thanks for the detailed report. I think I've fixed the bug. Could you pull the latest changes and try again?

Best wishes,

Phil

--
You received this message because you are subscribed to the Google Groups "Nematus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nematus-suppo...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/nematus-support/e1cdd7db-dc48-428c-a7e5-789b102f9192%40googlegroups.com.
<events.out.tfevents.1564471655.shu-anubad><train_bpe.sh>

Mohammad Mumin

unread,

Jul 31, 2019, 3:05:09 AM7/31/19

to Nematus Support

YES Sir, you have fixed the bug.

Thanks a lot sir.

Best wishes to Nemaus and its team.

On Thursday, July 25, 2019 at 1:11:54 PM UTC+6, Mohammad Mumin wrote:

Reply all

Reply to author

Forward

UnicodeDecodeError in build_dictionary.py

Mohammad Mumin

Philip Williams

Mohammad Mumin

Philip Williams

Mohammad Mumin

Philip Williams

Mohammad Mumin