Modification of non-creatable array value attempted

36 views
Skip to first unread message

Lewis John Mcgibbney

unread,
Feb 18, 2016, 11:50:34 PM2/18/16
to Joshua Developers
Hi Folks,
I am using Joshua master branch
I am getting a rather cryptic error "Modification of non-creatable array value attempted, subscript -1 at ../bin/pipeline.pl line 868." when running a training pipeline as below.
Any ideas what this is? I am going to look through the code right now however I thought I would throw the questions out to the community and see what results it yielded.
Entire log is below, thanks.
Thanks
Lewis

[lmcgibbn@crawl experiments]$ ../bin/pipeline.pl  --rundir . --type hiero --corpus input/commoncrawl.ru-en --tune input/commoncrawl.ru-en --test input/commoncrawl.ru-en --source en --target ru --rundir experiment_1/1 --readme "Experiment 1 run 1 of ru --> en model" --mbr
[train-copy-and-filter] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/paste /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru | /home/lmcgibbn/joshua/scripts/training/filter-empty-lines.pl | /home/lmcgibbn/joshua/scripts/training/split2files.pl /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.en /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.ru
  took 18 seconds (18s)
[train-tokenize-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.en [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.en | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl en | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.en
  took 1 seconds (1s)
[train-tokenize-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.ru | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl ru | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l ru 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.ru
  took 1 seconds (1s)
[train-trim] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/paste /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.en /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.ru | /home/lmcgibbn/joshua/scripts/training/trim_parallel_corpus.pl 50 | /home/lmcgibbn/joshua/scripts/training/split2files.pl /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.en /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.ru
  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[train-lowercase-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.lc.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.en | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.lc.en
  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[train-lowercase-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.lc.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.ru | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/train.tok.50.lc.ru
  took 0 seconds (0s)
[train-vocab-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/vocab.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.en | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/vocab.en
  took 0 seconds (0s)
[train-vocab-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/vocab.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.ru | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/vocab.ru
  took 0 seconds (0s)
[tune-copy-and-filter] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/paste /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru | /home/lmcgibbn/joshua/scripts/training/split2files.pl /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.en /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.ru
  took 17 seconds (17s)
[tune-tokenize-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.en [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.en | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl en | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.en
  took 0 seconds (0s)
[tune-tokenize-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.ru | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl ru | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l ru 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.ru
  took 1 seconds (1s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[tune-lowercase-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.lc.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.en | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.lc.en
  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[tune-lowercase-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.lc.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.ru | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/tune.tok.lc.ru
  took 0 seconds (0s)
[tune-vocab-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/corpus.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/vocab.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/corpus.en | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/vocab.en
  took 0 seconds (0s)
[tune-vocab-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/corpus.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/vocab.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/corpus.ru | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/tune/vocab.ru
  took 0 seconds (0s)
[test-copy-and-filter] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/paste /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.en /data/home/lmcgibbn/joshua/experiments/input/commoncrawl.ru-en.ru | /home/lmcgibbn/joshua/scripts/training/split2files.pl /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.en /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.ru
  took 17 seconds (17s)
[test-tokenize-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.en [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.en | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl en | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.en
  took 0 seconds (0s)
[test-tokenize-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.ru [CHANGED]
  cmd=/home/lmcgibbn/joshua/scripts/training/scat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.ru | /home/lmcgibbn/joshua/scripts/training/normalize-punctuation.pl ru | /home/lmcgibbn/joshua/scripts/training/penn-treebank-tokenizer.perl -l ru 2> /dev/null > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.ru
  took 1 seconds (1s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[test-lowercase-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.lc.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.en | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.lc.en
  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[test-lowercase-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.lc.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.ru | /home/lmcgibbn/joshua/scripts/lowercase.perl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/test.tok.lc.ru
  took 0 seconds (0s)
[test-vocab-en] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/corpus.en [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/vocab.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/corpus.en | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/vocab.en
  took 0 seconds (0s)
[test-vocab-ru] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/corpus.ru [CHANGED]
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/vocab.ru [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/corpus.ru | /home/lmcgibbn/joshua/scripts/training/build-vocab.pl > /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/test/vocab.ru
  took 0 seconds (0s)
[source-numlines] rebuilding...
  dep=/data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.en [CHANGED]
  cmd=cat /data/home/lmcgibbn/joshua/experiments/experiment_1/1/data/train/corpus.en | wc -l
  took 0 seconds (0s)
[source-numlines] retrieved cached result => 0
Modification of non-creatable array value attempted, subscript -1 at ../bin/pipeline.pl line 868.

Lewis John Mcgibbney

unread,
Feb 18, 2016, 11:54:00 PM2/18/16
to Joshua Developers
Here is the code block where it is failing... formatting looks a bit odd as well

https://github.com/joshua-decoder/joshua/blob/master/scripts/training/pipeline.pl#L867-L871

Lewis John Mcgibbney

unread,
Feb 19, 2016, 12:04:53 AM2/19/16
to Joshua Developers
Running the exact same Joshua master code on the exact same input parallel data with the exact same pipeline invocation on my MacOSX 10.9.5, all seems to be going well. As you can see below there is a cached result 817944 lines. The execution therefore progresses to invocation of Giza.

The code is failing on CentOS 6.6.

...
[test-vocab-ru] rebuilding...
  dep=/usr/local/joshua/experiments/experiment_1/1/data/test/corpus.ru
  dep=/usr/local/joshua/experiments/experiment_1/1/data/test/vocab.ru [NOT FOUND]
  cmd=cat /usr/local/joshua/experiments/experiment_1/1/data/test/corpus.ru | /usr/local/joshua/scripts/training/build-vocab.pl > /usr/local/joshua/experiments/experiment_1/1/data/test/vocab.ru
  took 20 seconds (20s)
[source-numlines] cached, skipping...
[source-numlines] retrieved cached result =>   817944
[giza-0] rebuilding...
  dep=/usr/local/joshua/experiments/experiment_1/1/data/train/splits/corpus.en.0 [CHANGED]
  dep=/usr/local/joshua/experiments/experiment_1/1/data/train/splits/corpus.ru.0 [CHANGED]
  dep=alignments/0/model/aligned.grow-diag-final [NOT FOUND]
  cmd=rm -f alignments/0/corpus.0-0.*; /usr/local/joshua/scripts/training/run-giza.pl --root-dir alignments/0 -e ru.0 -f en.0 -corpus /usr/local/joshua/experiments/experiment_1/1/data/train/splits/corpus -merge grow-diag-final -parallel > alignments/0/giza.log 2>&1

Matt Post

unread,
Feb 19, 2016, 7:17:12 AM2/19/16
to joshua_d...@googlegroups.com
Probably your alignments pieces failed and so there is nothing to assemble. Are the GIZA binaries in bin/?

(Can look more later.)

matt (from hand computer)
--
You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at https://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.

Lewis John Mcgibbney

unread,
Feb 19, 2016, 3:47:21 PM2/19/16
to joshua_d...@googlegroups.com
Argh, I am seeing the following when building

giza:
     [exec] make: Entering directory `/work/03702/tg830018/wrangler/joshua/ext/giza-pp'
     [exec] cc1plus: error: unrecognized command line option "-std=c++11"
     [exec] /work/03702/tg830018/wrangler/joshuacc1plus: error: unrecognized command line option "-std=c++11"
     [exec]
     [exec] /work/03702/tg830018/wrangler/joshua
     [exec] cc1plus: error: unrecognized command line option "-std=c++11"
     [exec] make -C GIZA++-v2
     [exec] make -C mkcls-v2

This is annoying. I will try my best to go back and see if I can get this sorted out. I assume this is the source of the issue.
Thanks
Lewis

--
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/_dgSaoahLPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_develop...@googlegroups.com.

To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at https://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.



--
Lewis

Lewis John Mcgibbney

unread,
Feb 19, 2016, 3:57:22 PM2/19/16
to joshua_d...@googlegroups.com
When building kenlm I was also getting

kenlm:
     [exec] -- The C compiler identification is GNU 4.4.7
     [exec] -- The CXX compiler identification is GNU 4.4.7
     [exec] -- Check for working C compiler: /usr/bin/cc
     [exec] -- Check for working C compiler: /usr/bin/cc -- works
     [exec] -- Detecting C compiler ABI info
     [exec] -- Detecting C compiler ABI info - done
     [exec] -- Check for working CXX compiler: /usr/bin/g++
     [exec] -- Check for working CXX compiler: /usr/bin/g++ -- works
     [exec] -- Detecting CXX compiler ABI info
     [exec] -- Detecting CXX compiler ABI info - done
     [exec] -- Boost version: 1.41.0
...
     [exec] [100%] Building CXX object lm/filter/CMakeFiles/phrase_table_vocab.dir/phrase_table_vocab_main.cc.o
     [exec] [100%] Built target filter
     [exec] Linking CXX executable ../../bin/phrase_table_vocab
     [exec] [100%] Built target phrase_table_vocab
     [exec] cc1plus: error: unrecognized command line option "-std=gnu++11"
     [exec] Result: 1

It looks like both of the following are casuing major issues


     [exec] cc1plus: error: unrecognized command line option "-std=c++11"
and
     [exec] cc1plus: error: unrecognized command line option "-std=gnu++11"

...investigating

--
Lewis

Lewis John Mcgibbney

unread,
Feb 19, 2016, 4:52:25 PM2/19/16
to joshua_d...@googlegroups.com
Hi Matt,
OK I got some feedback from the sys admins on the system I was working on.
Their feedback is to make Joshua flexible enough to use the Intel compiler if the GNU compiler is not available.
I am going to log a ticket for this and submit a patch.
Thanks
--
Lewis

Lewis John Mcgibbney

unread,
Feb 19, 2016, 7:53:37 PM2/19/16
to Joshua Developers
Hi Matt,
You asked "Are the GIZA binaries in bin/?"
Yes, please see below

login1.wrangler(35)$ ls ../bin/
bleu            build_binary    decoder         extract-1best   GIZA++          joshua-decoder  lmplz           meteor          mkcls           pipeline.pl     query           snt2cooc.out

I am still getting
Modification of non-creatable array value attempted, subscript -1 at /tmp/slurmd/job11120/slurm_script line 868.

Matt Post

unread,
Feb 20, 2016, 9:41:31 AM2/20/16
to joshua_d...@googlegroups.com
What are the line counts of

data/train/corpus.{ru,en}
alignments/training.align

The first line is the corpora, those are probably good and equal. I'm guessing alignment failed. The pipeline splits them into 1m line chunks and aligns each separately, see subdirs under alignments. They are then concated into alignments/training.align. I'm guessing some / all of these failed.

matt

Lewis John Mcgibbney

unread,
Feb 22, 2016, 11:49:46 PM2/22/16
to joshua_d...@googlegroups.com
Hi Matt,

On Sat, Feb 20, 2016 at 6:41 AM, Matt Post <po...@cs.jhu.edu> wrote:
What are the line counts of

data/train/corpus.{ru,en}

There is nothing in either of these flles
 
alignments/training.align

This file does not exist!
 

The first line is the corpora, those are probably good and equal. I'm guessing alignment failed.

I think even before alignment, training has failed.
 
The pipeline splits them into 1m line chunks and aligns each separately, see subdirs under alignments. They are then concated into alignments/training.align. I'm guessing some / all of these failed.


Yes it would appear so. I am looking through the logs in detail now and seeing if I can spot anything that went wrong.
I am running off of master branch again as of today.
Lewis

Lewis John Mcgibbney

unread,
Feb 22, 2016, 11:56:49 PM2/22/16
to joshua_d...@googlegroups.com
An example of my current log of trying to run a pipeline.
Two things which jump out at me
  1. Logging "Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.". Is this normal?
  2. Some stages taking 0s to complete, this is not looking good to me.

Any insight into what may be going wrong here would be great.

Thanks


in/train.tok.50.en /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.ru

  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[train-lowercase-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.lc.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.en | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.lc.en

  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[train-lowercase-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.lc.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.ru | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/train.tok.50.lc.ru

  took 0 seconds (0s)
[train-vocab-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/corpus.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/vocab.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/corpus.en | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/vocab.en

  took 0 seconds (0s)
[train-vocab-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/corpus.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/vocab.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/corpus.ru | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/train/vocab.ru

  took 0 seconds (0s)
[tune-copy-and-filter] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.en
  dep=/home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.en [NOT FOUND]
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.ru [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/paste /home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.en /home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.ru | /home/lmcgibbn/src/joshua/scripts/training/split2files.pl /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.en /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.ru
  took 20 seconds (20s)
[tune-tokenize-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.en [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/scat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.en | /home/lmcgibbn/src/joshua/scripts/training/normalize-punctuation.pl en | /home/lmcgibbn/src/joshua/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.en
  took 1 seconds (1s)
[tune-tokenize-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.ru [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/scat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.ru | /home/lmcgibbn/src/joshua/scripts/training/normalize-punctuation.pl ru | /home/lmcgibbn/src/joshua/scripts/training/penn-treebank-tokenizer.perl -l ru 2> /dev/null > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.ru

  took 1 seconds (1s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[tune-lowercase-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.lc.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.en | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.lc.en

  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[tune-lowercase-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.lc.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.ru | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/tune.tok.lc.ru

  took 0 seconds (0s)
[tune-vocab-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/corpus.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/vocab.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/corpus.en | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/vocab.en

  took 0 seconds (0s)
[tune-vocab-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/corpus.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/vocab.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/corpus.ru | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/tune/vocab.ru

  took 0 seconds (0s)
[test-copy-and-filter] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.en
  dep=/home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.en [NOT FOUND]
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.ru [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/paste /home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.en /home/lmcgibbn/src/joshua/experiments/input/commoncrawl.ru-en.ru | /home/lmcgibbn/src/joshua/scripts/training/split2files.pl /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.en /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.ru
  took 21 seconds (21s)
[test-tokenize-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.en [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/scat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.en | /home/lmcgibbn/src/joshua/scripts/training/normalize-punctuation.pl en | /home/lmcgibbn/src/joshua/scripts/training/penn-treebank-tokenizer.perl -l en 2> /dev/null > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.en
  took 1 seconds (1s)
[test-tokenize-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.ru [NOT FOUND]
  cmd=/home/lmcgibbn/src/joshua/scripts/training/scat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.ru | /home/lmcgibbn/src/joshua/scripts/training/normalize-punctuation.pl ru | /home/lmcgibbn/src/joshua/scripts/training/penn-treebank-tokenizer.perl -l ru 2> /dev/null > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.ru

  took 1 seconds (1s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[test-lowercase-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.lc.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.en | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.lc.en

  took 0 seconds (0s)
Use of uninitialized value $line in pattern match (m//) at ../bin/pipeline.pl line 2011.
[test-lowercase-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.lc.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.ru | /home/lmcgibbn/src/joshua/scripts/lowercase.perl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/test.tok.lc.ru
  took 1 seconds (1s)
[test-vocab-en] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/corpus.en
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/vocab.en [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/corpus.en | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/vocab.en

  took 0 seconds (0s)
[test-vocab-ru] rebuilding...
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/corpus.ru
  dep=/home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/vocab.ru [NOT FOUND]
  cmd=cat /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/corpus.ru | /home/lmcgibbn/src/joshua/scripts/training/build-vocab.pl > /home/lmcgibbn/src/joshua/experiments/experiment1/run1/data/test/vocab.ru
  took 0 seconds (0s)
[source-numlines] cached, skipping...
[source-numlines] retrieved cached result => 0

Modification of non-creatable array value attempted, subscript -1 at ../bin/pipeline.pl line 868.
--
Lewis

Lewis John Mcgibbney

unread,
Feb 23, 2016, 12:07:58 AM2/23/16
to joshua_d...@googlegroups.com
I am trying to run pipeline's on a variety of machines. The only pipelines I can get to run successfully are local on my Mac OSX laptop.
I'm having teething issues running on every other machine I've tried.
--
Lewis

Matt Post

unread,
Feb 23, 2016, 9:50:46 AM2/23/16
to joshua_d...@googlegroups.com
So this means you have no training data somehow, which explains why it wasn't aligned. What was the pipeline command you used? What Joshua are you using (dev?)

matt


Lewis John Mcgibbney

unread,
Feb 24, 2016, 12:47:07 PM2/24/16
to joshua_d...@googlegroups.com
Hi Matt,

On Tue, Feb 23, 2016 at 6:50 AM, Matt Post <po...@cs.jhu.edu> wrote:
So this means you have no training data somehow, which explains why it wasn't aligned.

My training data looks as follows

[lmcgibbn@scispark1 experiments]$ head -4 input/commoncrawl.ru-en.ru
iron cement - это готовая к использованию паста, которая наносится шпателем или пальцами в виде закругленного перехода в углы сталелитейного кокиля.
После отверждения iron cement защищает кокиль от горячего абразивного стального литья.
Перед каждой новой заливкой необходимо заново нанести iron cement слоем толщиной ~ 2-3 мм.
огнеупорной ремонтной шпаклевки для топочных установок, печей и т.д.
...

[lmcgibbn@scispark1 experiments]$ head -4 input/commoncrawl.ru-en.en
iron cement is a ready for use paste which is laid as a fillet by putty knife or finger in the mould edges (corners) of the steel ingot mould.
iron cement protects the ingot against the hot, abrasive steel casting process.
iron cement is freshly applied after each steel pour in a coating thickness of approx. ~ 2-3 mm.
a fire restant repair cement for fire places, ovens, open fireplaces etc.
...

There are 817944 lines in each file.
 
What was the pipeline command you used?

[lmcgibbn@scispark1 experiments]$ ../bin/pipeline.pl  --rundir . --type hiero --corpus input/commoncrawl.ru-en --tune input/commoncrawl.ru-en --test input/commoncrawl.ru-en --source en --target ru --rundir experiment1/run1 --readme "Experiment 1 Run 1 of ru --> en model training" --mbr
 
What Joshua are you using (dev?)


I am always running directly off of master.
The thing which is baffling me here is the fact that the exact same pipeline runs, with the exact same command on my Laptop.
I just deleted Joshua and the input commoncrawl parallel corpus, downloaded everything from scratch and tried it again. Exact same result!
I am baffled!
Thanks for your persistent help Matt, I appreciate it.
Lewis
 

Matt Post

unread,
Feb 24, 2016, 2:07:48 PM2/24/16
to joshua_d...@googlegroups.com
Two notes:

- The fault is mine; a recent push broke the data preparation. I'll fix this now.

- You need to use different train, tune, and test sets. Tune and test should be no more than 10k lines, and typically 3–5k.

matt


Matt Post

unread,
Feb 24, 2016, 2:15:56 PM2/24/16
to joshua_d...@googlegroups.com
Actually, this may not have been my problem. What version of Perl are you using? Joshua requires >= 5.12. I'm guessing the file .cachepiepe/train-tokenize-ru/err has a complaint about this, maybe?

Matt Post

unread,
Feb 24, 2016, 2:26:53 PM2/24/16
to joshua_d...@googlegroups.com
Okay, I'm betting it's your Perl version.

One of the subscripts has a Perl >= 5.12 requirement, due to tokenization inconsistencies. I added this requirement to pipeline.pl, so that users will be made aware of it sooner. I also modified a few perl scripts to execute "/usr/bin/env perl" instead of "/usr/bin/perl", so you can use a non-standard Perl.

These have been pushed to master. Let me know how it goes.

matt

Lewis John Mcgibbney

unread,
Feb 24, 2016, 8:19:53 PM2/24/16
to joshua_d...@googlegroups.com
Hi Matt,
This thread can be closed off.
The prerequisite was Perl >= 5.12.
Thanks very much for the suggestion... this one was a PITA for sure :)
Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/_dgSaoahLPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_develop...@googlegroups.com.

To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at https://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.



--
Lewis
Reply all
Reply to author
Forward
0 new messages