Not showing the output

15 views
Skip to first unread message

Shantanu Nath

unread,
Apr 23, 2020, 4:57:10 AM4/23/20
to Nematus Support
Dear Sir,
I trained given test data and found a model. But when i translate that with translate.py it shows nothing like following image.

Capture_1.JPG



What is the reason behind this problem??

Best regards,
Shantanu Nath

R S

unread,
Apr 29, 2020, 6:04:57 AM4/29/20
to Nematus Support
Hello Shantanu,

this looks like there was no error - the most probable translation output was simply empty. I suggest that you:

- use length normalization "-n 1" as an argument to translate.py to reward longer translation.
- train a stronger model. I see that your model was only trained for 500 updates; a more typical number would be in the range of 10,000-500,000 updates (depending on amount of data, batch size, and other factors. Rather than predicting the length of training beforehand, you can also pass a validation set to the training script, and it will then keep track of model improvements and save the best model checkpoint, so you can see when training is converging)
- make sure there are no mismatches between training and test time, for example in the preprocessing of the input.

best wishes,
Rico

Rico Sennrich

unread,
May 3, 2020, 3:54:41 AM5/3/20
to Shantanu Nath, nematus...@googlegroups.com
Hello Shantanu,

this indicates that one of your text files, in this case the source validation set (small_test_data/trainDevEn.txt), is not encoded in UTF-8. Nematus requires UTF-8 formatted text.

best wishes,
Rico

On 30/04/2020 20:16, Shantanu Nath wrote:
Dear Sir, 
First of all, Thank you so your response. I am sorry to say that I am novice but enthusiast about this sector. After following your scripts it gives me more hope to move forward.
Unfortunately, I encounter a new problem:
"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 2889: invalid start byte"

Here is the command I wrote,

```
python nematus/train.py --datasets small_test_data/TrainEn.txt small_test_data/trainBn.txt --dictionaries orginal_data/training.en.json orginal_data/training.bn.json --valid_source_dataset small_test_data/trainDevEn.txt --valid_target_dataset small_test_data/trainDevBn.txt --dim_word 256 --dim 512 --n_words_src 30000 --n_words 30000 --maxlen 50 --optimizer adam --lrate 0.0001 --batch_size 40 --no_shuffle --dispFreq 500 --finish_after 10000

```
I attached 3 files Source, Target and Dictionary files which may help u find out actual Problem.

Here is the screenshots what I got followed by the command,

Capture1.JPG
Capture2.JPG

On Wed, Apr 29, 2020 at 11:49 PM Rico Sennrich <rico.s...@gmx.ch> wrote:
Hello Shantanu,

this is just a toy dataset with 1000 sentences, not enough to build a strong translation model.

I suggest you have a look at https://github.com/EdinburghNLP/wmt17-transformer-scripts/tree/master/training , which gives instructions how to train a well-performing system for English-German.

best wishes,
Rico


On 29/04/2020 18:43, Shantanu Nath wrote:
Dear Sir, 
I want to train on a data set which is on Test/en-de folder. It creates model but when i want to translate it shows nothing.
I kept the same parameter u have provided on "train.sh"
Should I increase the value of "finish_after " from 500  to 10,000??

Best regards,
Shantanu Nath

--
You received this message because you are subscribed to the Google Groups "Nematus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nematus-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nematus-support/e8c185d3-4517-455b-8bf9-fbf3af45b37d%40googlegroups.com.



Shantanu Nath

unread,
May 3, 2020, 5:43:22 AM5/3/20
to Rico Sennrich, nematus...@googlegroups.com
Thank you for the clarification. 
Reply all
Reply to author
Forward
0 new messages