Hi,
please post your training configurations and the commands you are using for translation.
Marcin
--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/a42a0fa4-7fc9-445f-9fe4-61491b0914af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hm, are you sure these are all your settings? For 8,000 iterations you should not even have a model saved as it saves by default after 10,000 iterations. Are you training on the CPU? Because usually 8,000 iterations would not take much longer than at most an hour, much less on my GPUs. A model becomes usable after maybe 50,000 iterations, rather 100,000.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/b4f582fe-be29-4bc3-8003-299ba5295450%40googlegroups.com.
Can you check if corpus.en / ro have the same number of lines and post the first few lines for both corpora, also the first few lines for both vocabularies.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35RsTfTp%3DEr2OnTjv1QgpRBJHAkxDD8Fb-ASQxZQ50PyWg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/cb51cdc01528d0cc85c5bd99384d18c5%40amu.edu.pl.
For more options, visit https://groups.google.com/d/optout.
OK, can you still have me have a look? Especially at the vocabs. If there is something wrong it might cause the issue you are seeing.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35SHGqgC3YZdPipbUgd7-PgKAbAfV%2Bx69ohN6_uxMH7Wuw%40mail.gmail.com.
OK, can you still have me have a look?
Especially at the vocabs. If there is something wrong it might cause the issue you are seeing.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/b4f582fe-be29-4bc3-8003-299ba5295450%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35RsTfTp%3DEr2OnTjv1QgpRBJHAkxDD8Fb-ASQxZQ50PyWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/cb51cdc01528d0cc85c5bd99384d18c5%40amu.edu.pl.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
W dniu 2018-07-06 14:32, Marcin Junczys-Dowmunt napisał(a):
I took a look at the files you made available. I am not sure why there is basically empty output, but a few observations:
1) There is no preprocessing at all, no tokenization etc. Marian expects preprocessed, at least tokenized data. You should also familiarize youself with BPE subwords: https://github.com/rsennrich/subword-nmt
2) It seems your corpora have windows line endings, linux has different line endings, not sure how much that harms, but it's probably not a good idea.
3) Because you have no preprocessing, your vocabularies are huge (about 200,000 items each) this may cause data sparsity and prohibit learning.
I recommend trying one of the examples from https://github.com/marian-nmt/marian-examples , maybe start with training-basics, there you have a full example with proper preprocessing.
W dniu 2018-07-06 14:16, Peyman Passban napisał(a):
here is the result from another model:root@a7874f8a1673:/Marian/tep++# /home/ml/Marian/marian/build/marian-decoder -m en-tr.npz -v vocab.en vocab.fa --allow-unk --n-best <<<"hi how are you"
[2018-07-06 21:11:49] [config] allow-unk: true
[2018-07-06 21:11:49] [config] beam-size: 12
[2018-07-06 21:11:49] [config] best-deep: false
[2018-07-06 21:11:49] [config] clip-gemm: 0
[2018-07-06 21:11:49] [config] cpu-threads: 0
[2018-07-06 21:11:49] [config] dec-cell: gru
[2018-07-06 21:11:49] [config] dec-cell-base-depth: 2
[2018-07-06 21:11:49] [config] dec-cell-high-depth: 1
[2018-07-06 21:11:49] [config] dec-depth: 1
[2018-07-06 21:11:49] [config] devices:
[2018-07-06 21:11:49] [config] - 0
[2018-07-06 21:11:49] [config] dim-emb: 512
[2018-07-06 21:11:49] [config] dim-rnn: 1024
[2018-07-06 21:11:49] [config] dim-vocabs:
[2018-07-06 21:11:49] [config] - 60588
[2018-07-06 21:11:49] [config] - 91728
[2018-07-06 21:11:49] [config] enc-cell: gru
[2018-07-06 21:11:49] [config] enc-cell-depth: 1
[2018-07-06 21:11:49] [config] enc-depth: 1
[2018-07-06 21:11:49] [config] enc-type: bidirectional
[2018-07-06 21:11:49] [config] ignore-model-config: false
[2018-07-06 21:11:49] [config] input:
[2018-07-06 21:11:49] [config] - stdin
[2018-07-06 21:11:49] [config] interpolate-env-vars: false
[2018-07-06 21:11:49] [config] layer-normalization: false
[2018-07-06 21:11:49] [config] log-level: info
[2018-07-06 21:11:49] [config] max-length: 1000
[2018-07-06 21:11:49] [config] max-length-crop: false
[2018-07-06 21:11:49] [config] max-length-factor: 3
[2018-07-06 21:11:49] [config] maxi-batch: 1
[2018-07-06 21:11:49] [config] maxi-batch-sort: none
[2018-07-06 21:11:49] [config] mini-batch: 1
[2018-07-06 21:11:49] [config] mini-batch-words: 0
[2018-07-06 21:11:49] [config] models:
[2018-07-06 21:11:49] [config] - en-tr.iter10000.npz
[2018-07-06 21:11:49] [config] n-best: true
[2018-07-06 21:11:49] [config] normalize: 0
[2018-07-06 21:11:49] [config] optimize: false
[2018-07-06 21:11:49] [config] port: 8080
[2018-07-06 21:11:49] [config] quiet: false
[2018-07-06 21:11:49] [config] quiet-translation: false
[2018-07-06 21:11:49] [config] relative-paths: false
[2018-07-06 21:11:49] [config] right-left: false
[2018-07-06 21:11:49] [config] seed: 0
[2018-07-06 21:11:49] [config] skip: false
[2018-07-06 21:11:49] [config] skip-cost: false
[2018-07-06 21:11:49] [config] tied-embeddings: false
[2018-07-06 21:11:49] [config] tied-embeddings-all: false
[2018-07-06 21:11:49] [config] tied-embeddings-src: false
[2018-07-06 21:11:49] [config] transformer-aan-activation: swish
[2018-07-06 21:11:49] [config] transformer-aan-depth: 2
[2018-07-06 21:11:49] [config] transformer-aan-nogate: false
[2018-07-06 21:11:49] [config] transformer-decoder-autoreg: self-attention
[2018-07-06 21:11:49] [config] transformer-dim-aan: 2048
[2018-07-06 21:11:49] [config] transformer-dim-ffn: 2048
[2018-07-06 21:11:49] [config] transformer-ffn-activation: swish
[2018-07-06 21:11:49] [config] transformer-ffn-depth: 2
[2018-07-06 21:11:49] [config] transformer-heads: 8
[2018-07-06 21:11:49] [config] transformer-no-projection: false
[2018-07-06 21:11:49] [config] transformer-postprocess: dan
[2018-07-06 21:11:49] [config] transformer-postprocess-emb: d
[2018-07-06 21:11:49] [config] transformer-preprocess: ""
[2018-07-06 21:11:49] [config] type: amun
[2018-07-06 21:11:49] [config] version: v1.5.0+1582f99
[2018-07-06 21:11:49] [config] vocabs:
[2018-07-06 21:11:49] [config] - vocab.en
[2018-07-06 21:11:49] [config] - vocab.fa
[2018-07-06 21:11:49] [config] word-penalty: 0
[2018-07-06 21:11:49] [config] workspace: 512
[2018-07-06 21:11:49] [config] Model created with Marian v1.5.0+1582f99
[2018-07-06 21:11:49] [data] Loading vocabulary from text file vocab.en
[2018-07-06 21:11:49] [data] Setting vocabulary size for input 0 to 60588
[2018-07-06 21:11:49] [data] Loading vocabulary from text file vocab.fa
[2018-07-06 21:11:52] [memory] Extending reserved space to 512 MB (device gpu0)
[2018-07-06 21:11:52] Loading scorer of type amun as feature F0
[2018-07-06 21:11:52] Loading model from en-tr.iter10000.npz
[2018-07-06 21:11:52] [memory] Reserving 606 MB, device gpu0
[2018-07-06 21:11:53] Best translation 0 : <unk> <unk> <unk> <unk>
0 ||| <unk> <unk> <unk> <unk> ||| F0= -0.813782 ||| -0.813782
0 ||| <unk> <unk> <unk> <unk> <unk> ||| F0= -1.54337 ||| -1.54337
0 ||| <unk> <unk> <unk> ||| F0= -1.67674 ||| -1.67674
0 ||| <unk> <unk> ||| F0= -2.53435 ||| -2.53435
0 ||| <unk> <unk> <unk> <unk> <unk> <unk> ||| F0= -2.98951 ||| -2.98951
0 ||| <unk> <unk> <unk> <unk> <unk> <unk> <unk> ||| F0= -4.42481 ||| -4.42481
0 ||| <unk> ||| F0= -4.81378 ||| -4.81378
0 ||| ||| F0= -5.97186 ||| -5.97186
0 ||| <unk> <unk> <unk> <unk> <unk> jybHay: 21939 ||| F0= -22.214 ||| -22.214
0 ||| <unk> <unk> <unk> <unk> <unk> vlyCk: 88410 ||| F0= -22.218 ||| -22.218
0 ||| <unk> <unk> <unk> <unk> <unk> ahmganH: 1006 ||| F0= -22.2183 ||| -22.2183
0 ||| <unk> <unk> <unk> <unk> <unk> anSatvn: 54540 ||| F0= -22.2186 ||| -22.2186
[2018-07-06 21:11:53] Total time: 0.067736s wall, 0.030000s user + 0.030000s system = 0.060000s CPU (88.6%)
Cheers-POn Fri, Jul 6, 2018 at 3:52 PM, Peyman Passban <pe....@gmail.com> wrote:
sorry for spamming, I forgot to attach this:/home/ml/Marian/marian/build/marian --after-batches 50000
--train-sets ./tep++/train.en ./tep++/train.fa
--model ./tep++/en-tr.npz --vocabs ./tep++/vocab.en
./tep++/vocab.fa >log-training-en-fa.txt
/home/ml/Marian/marian/build/marian-decoder -m ./en-tr.npz -v vocab.en vocab.fa <en.txt >out.txt
Cheers-POn Fri, Jul 6, 2018 at 3:50 PM, Peyman Passban <pe....@gmail.com> wrote:
Hi Marcin,I've trained a new model for translating from En to Farsi. I've trained the model 4hrs and the dataset size is 500K.Now I've just tried to translate the first 10 lines of the training set but it generated empty lines again. For some un known reason it also puts "xxxx:8253" in the last line of the translation file. "xxxx" is a farsi word in the vocab.fa file and 8253 is its freq. Do you have any idea?Cheers-P
On Fri, Jul 6, 2018 at 11:30 AM, Peyman Passban <pe....@gmail.com> wrote:
Hey Marcin,thank a mill for your attention.Here is the link to the files: https://drive.google.com/open?id=1wSDZ0_TMyJHe5gcBd9BDGnXhcwFYEaUYUsing this dataset, I tried to train a simple chatbot.Both source and target langs are En, this might be a problem! I have no idea about Marian! or Line 9 in the vocab file looks a bit strange! this could be another reason!Now I'm training a new model for translating from En to Farsi (Persian).I'll let you know the result.Cheers-P
On Thu, Jul 5, 2018 at 6:11 PM, Marcin Junczys-Dowmunt <jun...@amu.edu.pl> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/b4f582fe-be29-4bc3-8003-299ba5295450%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35RsTfTp%3DEr2OnTjv1QgpRBJHAkxDD8Fb-ASQxZQ50PyWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/cb51cdc01528d0cc85c5bd99384d18c5%40amu.edu.pl.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+unsubscribe@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
Can you try one of the prepared examples from marian-examples, for instance training-basics?
Otherwise if you make the training data available I can give it a try. It should just work, so I am not sure why that would happen.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/857fbc179bd668e473759cfb8b023157%40amu.edu.pl.
Cheers-P
Cheers-P
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/b4f582fe-be29-4bc3-8003-299ba5295450%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35RsTfTp%3DEr2OnTjv1QgpRBJHAkxDD8Fb-ASQxZQ50PyWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/cb51cdc01528d0cc85c5bd99384d18c5%40amu.edu.pl.
For more options, visit https://groups.google.com/d/optout.
--
Sent from my iPhone--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35SHGqgC3YZdPipbUgd7-PgKAbAfV%2Bx69ohN6_uxMH7Wuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/0c561102-8192-497b-b2ff-b450796aaa60%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35TsYDr51khAKRx4%2BQ17%2BQAutwqJw27qrZFQK3Zp5GOqXg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAB2pGneg33MJL7JDfyzM9wTO3XWC_frOQokN3uhPZ5yLuDp_sA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/CAC-B35SQODevwKQ%2BvY9RE5GJ-prMvUgWgB5N5wEbFtDOMFzT%2BA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "marian-nmt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marian-nmt+...@googlegroups.com.
To post to this group, send email to maria...@googlegroups.com.
Visit this group at https://groups.google.com/group/marian-nmt.
To view this discussion on the web visit https://groups.google.com/d/msgid/marian-nmt/939319ac-0308-44a0-9b22-14fdf233ee20%40googlegroups.com.