Command : JOSHUA/bin/
pipeline.pl --rundir 1 --corpus input/train --tune input/tune --test input/test --aligner berkeley --lm-gen srilm --lm-order 3 --source en --target hi
version : joshua v6.0.1
Size of the data set
lines : Input file
268000 train.en
268000 train.en
5000 test.en
5000 test.hi
1000 tune.en
1000 tune.hi
Output :
source en --target hi
[train-copy-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/train.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.hi.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/train.hi | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/train/train.hi.gz
took 16 seconds (16s)
[train-copy-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/train.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.en.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/train.en | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/train/train.en.gz
took 3 seconds (3s)
[train-tokenize-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.hi.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/train/train.hi.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl hi | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l hi 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.hi.gz
took 20 seconds (20s)
[train-tokenize-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.en.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/train/train.en.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl en | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.en.gz
took 13 seconds (13s)
[train-trim] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.hi.gz [NOT FOUND]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.en.gz [NOT FOUND]
cmd=paste <(gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.hi.gz) <(gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.en.gz) | /home/smt/joshua-v6.0.1/scripts/training/
trim_parallel_corpus.pl 50 | /home/smt/joshua-v6.0.1/scripts/training/
split2files.pl /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.hi.gz /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.en.gz
took 10 seconds (10s)
[train-lowercase-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.lc.hi [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.hi.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.lc.hi
took 2 seconds (2s)
[train-lowercase-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.lc.en [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.en.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/train/train.tok.50.lc.en
took 0 seconds (0s)
[train-vocab-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/vocab.hi [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/train/vocab.hi
took 2 seconds (2s)
[train-vocab-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/vocab.en [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/train/vocab.en
took 1 seconds (1s)
[tune-copy-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/tune.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.hi.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/tune.hi | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.hi.gz
took 0 seconds (0s)
[tune-copy-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/tune.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.en.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/tune.en | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.en.gz
took 0 seconds (0s)
[tune-tokenize-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.hi.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.hi.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl hi | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l hi 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.hi.gz
took 0 seconds (0s)
[tune-tokenize-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.en.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.en.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl en | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.en.gz
^[[1;2A took 0 seconds (0s)
[tune-lowercase-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.lc.hi [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.hi.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.lc.hi
took 0 seconds (0s)
[tune-lowercase-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.lc.en [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.en.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/tune/tune.tok.lc.en
took 0 seconds (0s)
[tune-vocab-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/corpus.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/vocab.hi [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/tune/corpus.hi | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/tune/vocab.hi
took 0 seconds (0s)
[tune-vocab-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/corpus.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/tune/vocab.en [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/tune/corpus.en | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/tune/vocab.en
took 0 seconds (0s)
[test-copy-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/test.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.hi.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/test.hi | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/test/test.hi.gz
took 1 seconds (1s)
[test-copy-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/input/test.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.en.gz [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/input/test.en | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/test/test.en.gz
took 0 seconds (0s)
[test-tokenize-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.hi.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/test/test.hi.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl hi | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l hi 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.hi.gz
took 0 seconds (0s)
[test-tokenize-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.en.gz [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/test/test.en.gz | /home/smt/joshua-v6.0.1/scripts/training/
normalize-punctuation.pl en | /home/smt/joshua-v6.0.1/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.en.gz
took 0 seconds (0s)
[test-lowercase-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.hi.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.lc.hi [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.hi.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.lc.hi
took 1 seconds (1s)
[test-lowercase-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.en.gz [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.lc.en [NOT FOUND]
cmd=gzip -cd /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.en.gz | /home/smt/joshua-v6.0.1/scripts/lowercase.perl > /home/smt/HindiMachineTranslationSystem/1/data/test/test.tok.lc.en
took 0 seconds (0s)
[test-vocab-hi] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/corpus.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/vocab.hi [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/test/corpus.hi | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/test/vocab.hi
took 0 seconds (0s)
[test-vocab-en] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/corpus.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/test/vocab.en [NOT FOUND]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/test/corpus.en | /home/smt/joshua-v6.0.1/scripts/training/
build-vocab.pl > /home/smt/HindiMachineTranslationSystem/1/data/test/vocab.en
took 0 seconds (0s)
[source-numlines] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en [CHANGED]
cmd=cat /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en | wc -l
took 0 seconds (0s)
[source-numlines] retrieved cached result => 222915
[berkeley-aligner-chunk-0] rebuilding...
dep=alignments/0/word-align.conf [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/splits/corpus.en.0 [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/splits/corpus.hi.0 [CHANGED]
dep=alignments/0/training.align [NOT FOUND]
cmd=java -d64 -Xmx10g -jar /home/smt/joshua-v6.0.1/lib/berkeleyaligner.jar ++alignments/0/word-align.conf
took 1809 seconds (30m9s)
[aligner-combine] rebuilding...
dep=alignments/0/training.align [CHANGED]
dep=alignments/training.align [NOT FOUND]
cmd=cat alignments/0/training.align > alignments/training.align
took 1 seconds (1s)
[thrax-input-file] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi [CHANGED]
dep=alignments/training.align [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/thrax-input-file [NOT FOUND]
cmd=paste /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.en /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi alignments/training.align | perl -pe 's/\t/ ||| /g' | grep -v '()' | grep -v '||| \+$' > /home/smt/HindiMachineTranslationSystem/1/data/train/thrax-input-file
took 1 seconds (1s)
[thrax-run] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/thrax-input-file [CHANGED]
dep=thrax-hiero.conf [CHANGED]
dep=grammar.gz [NOT FOUND]
cmd=hadoop/bin/hadoop jar /home/smt/joshua-v6.0.1/thrax/bin/thrax.jar -D mapred.child.java.opts='-Xmx2g' thrax-hiero.conf thrax > thrax.log 2>&1; rm -f grammar grammar.gz; hadoop/bin/hadoop fs -getmerge thrax/final/ grammar.gz; hadoop/bin/hadoop fs -rmr thrax
took 393 seconds (6m33s)
[lm-sort-uniq] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi [CHANGED]
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi.uniq [NOT FOUND]
cmd=/home/smt/joshua-v6.0.1/scripts/training/scat /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi | sort -u -T /tmp -S 2G | gzip -9n > /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi.uniq
took 29 seconds (29s)
[srilm] rebuilding...
dep=/home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi.uniq [CHANGED]
dep=lm.gz [NOT FOUND]
cmd=/home/smt/pj/srilm-1.7.1/bin/i686-m64/ngram-count -order 3 -interpolate -kndiscount -unk -gt3min 1 -gt4min 1 -gt5min 1 -text /home/smt/HindiMachineTranslationSystem/1/data/train/corpus.hi.uniq -lm lm.gz
JOB FAILED (return code 1)
one of modified KneserNey discounts is negative
error in discount estimator for order 1
Any hint why this error happens ? please help me