aligner fails

19 views
Skip to first unread message

dnm

unread,
Jul 6, 2012, 12:19:15 PM7/6/12
to cdec-...@googlegroups.com
Hello all,

I have been trying to get the aligner to work, and it has been acting strangely (then failing). 

First I ran:

$ $CDEC/word-aligner/aligner.pl corpus.source-target
$ cd talign
$ make

as directed on the wiki.  The cluster-ptrain.pl part ran OK (at least I think it was that part), but
then the next step failed. So I typed 'make' again, and the next step began (mkcls, I believe).
Weird.

When this finishes (successfully, the results of mkcls are there in ./grammars), I get:

Making model-f-e ...
make[1]: Entering directory `/home/dnm/cdec/talign/model-f-e'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory `/home/dnm/cdec/talign/model-f-e'
make: *** [all] Error 1

I'm a bit confused, because I thought the makefiles would have already been created before this
step was attempted.

If it helps, I'm using a pull from the cdec git repository from yesterday (05 July 2012), and I'm running on an Ubuntu
Linux box where the install (of cdec) went smoothly and all tests passed.  Any tips, tricks or advice is greatly
appreciated.

Best,
Dennis

dnm

unread,
Jul 6, 2012, 3:02:58 PM7/6/12
to cdec-...@googlegroups.com
Sorry, the first step that runs is 'model1' (as I should have suspected).  Here's a trace of the whole thing (so you can see where it fails, then succeeds upon re-invocation of 'make').  Also, I forgot to mention that I have straight-up Urdu characters in my Urdu text -- i.e., no transliteration to ASCII. Could this be a problem? And as a subquestion: how do you transliterate your Urdu texts (e.g., the stemmer and orthographic normalization stuff seems to expect ACSII)?

-------------------------------------------------------------------------------------------------------
$ ./word-aligner/aligner.pl ~/moses-wksp/par-corpora/corp.ur-en --mkcls=/home/dnm/moses-wksp/bin/mkcls
 Using mkcls in: /home/dnm/moses-wksp/bin/mkcls

Source language: ur
Target language: en
Created alignment task. chdir to talign/ then type make.

$ cd talign
$ ls
grammars  Makefile  model-f-e
$ make
Making grammars ...
make[1]: Entering directory `/home/dnm/cdec/talign/grammars'
/home/dnm/cdec/word-aligner/support/merge_corpus.pl corpus.f corpus.e > corpus.f-e
/home/dnm/cdec/word-aligner/../training/model1 -v corpus.f-e > corpus.f-e.model1
ITERATION 1
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
expected target length = source length * 1.01541
  log_e likelihood: -3.5274e+07
  log_2 likelihood: -5.08896e+07
   cross entropy: 29.8974
      perplexity: 1e+09
ITERATION 2
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -1.05303e+07
  log_2 likelihood: -1.5192e+07
   cross entropy: 8.92519
      perplexity: 486.127
ITERATION 3
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.27877e+06
  log_2 likelihood: -1.19437e+07
   cross entropy: 7.01688
      perplexity: 129.506
ITERATION 4
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -7.92864e+06
  log_2 likelihood: -1.14386e+07
   cross entropy: 6.72012
      perplexity: 105.428
ITERATION 5 (FINAL)
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -7.81923e+06
  log_2 likelihood: -1.12808e+07
   cross entropy: 6.62738
      perplexity: 98.8643
/home/dnm/cdec/word-aligner/support/merge_corpus.pl corpus.e corpus.f > corpus.e-f
/home/dnm/cdec/word-aligner/../training/model1 -v -V corpus.e-f > corpus.e-f.model1
ITERATION 1
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
expected target length = source length * 1.0475
  log_e likelihood: -3.69182e+07
  log_2 likelihood: -5.32617e+07
   cross entropy: 29.8974
      perplexity: 1e+09
ITERATION 2
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -1.1369e+07
  log_2 likelihood: -1.6402e+07
   cross entropy: 9.20693
      perplexity: 590.967
ITERATION 3
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.98422e+06
  log_2 likelihood: -1.29615e+07
   cross entropy: 7.27567
      perplexity: 154.952
ITERATION 4
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.60816e+06
  log_2 likelihood: -1.2419e+07
   cross entropy: 6.97113
      perplexity: 125.464
ITERATION 5 (FINAL)
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.49289e+06
  log_2 likelihood: -1.22526e+07
   cross entropy: 6.87777
      perplexity: 117.602
/home/dnm/cdec/word-aligner/support/make_lex_grammar.pl corpus.f-e corpus.f-e.model1 corpus.e-f.model1 | /usr/bin/gzip -9 > corpus.f-e.lex-grammar.gz
/bin/sh: /usr/bin/gzip: not found
Reading model1...
Reading inverse model1...
Added 78294 from inverse model1
Generating grammars...
make[1]: *** [corpus.f-e.lex-grammar.gz] Error 127
make[1]: Leaving directory `/home/dnm/cdec/talign/grammars'

make: *** [all] Error 1
$ make
Making grammars ...
make[1]: Entering directory `/home/dnm/cdec/talign/grammars'
/home/dnm/cdec/word-aligner/../training/model1 -t -999999 -v -V corpus.f-e > corpus.f-e.full-model1
ITERATION 1
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
expected target length = source length * 1.01541
  log_e likelihood: -3.5274e+07
  log_2 likelihood: -5.08896e+07
   cross entropy: 29.8974
      perplexity: 1e+09
ITERATION 2
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -1.05303e+07
  log_2 likelihood: -1.5192e+07
   cross entropy: 8.92519
      perplexity: 486.127
ITERATION 3
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.27877e+06
  log_2 likelihood: -1.19437e+07
   cross entropy: 7.01688
      perplexity: 129.506
ITERATION 4
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -7.92864e+06
  log_2 likelihood: -1.14386e+07
   cross entropy: 6.72012
      perplexity: 105.428
ITERATION 5 (FINAL)
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -7.81923e+06
  log_2 likelihood: -1.12808e+07
   cross entropy: 6.62738
      perplexity: 98.8643
/home/dnm/cdec/word-aligner/../training/model1 -t -999999 -v -V corpus.e-f > corpus.e-f.full-model1
ITERATION 1
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
expected target length = source length * 1.0475
  log_e likelihood: -3.69182e+07
  log_2 likelihood: -5.32617e+07
   cross entropy: 29.8974
      perplexity: 1e+09
ITERATION 2
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -1.1369e+07
  log_2 likelihood: -1.6402e+07
   cross entropy: 9.20693
      perplexity: 590.967
ITERATION 3
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.98422e+06
  log_2 likelihood: -1.29615e+07
   cross entropy: 7.27567
      perplexity: 154.952
ITERATION 4
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.60816e+06
  log_2 likelihood: -1.2419e+07
   cross entropy: 6.97113
      perplexity: 125.464
ITERATION 5 (FINAL)
.................................................. [50000]
.................................................. [100000]
.................................................. [150000]
.................................................. [200000]
..
  log_e likelihood: -8.49289e+06
  log_2 likelihood: -1.22526e+07
   cross entropy: 6.87777
      perplexity: 117.602
/home/dnm/cdec/word-aligner/support/extract_vocab.pl < corpus.f > f.voc
Extracting vocabulary...
55615 types / 1781484 tokens
/home/dnm/cdec/word-aligner/ortho-norm/ur.pl < f.voc > f.ortho-voc
/home/dnm/cdec/word-aligner/support/merge_corpus.pl f.voc f.ortho-voc > orthonorm-dict.f
/home/dnm/cdec/word-aligner/support/extract_vocab.pl < corpus.e > e.voc
Extracting vocabulary...
50313 types / 1702145 tokens
/home/dnm/cdec/word-aligner/ortho-norm/en.pl < e.voc > e.ortho-voc
/home/dnm/cdec/word-aligner/support/merge_corpus.pl e.voc e.ortho-voc > orthonorm-dict.e
/home/dnm/moses-wksp/bin/mkcls -c50 -n10 -pcorpus.e -Vvoc2class.e opt

***** 10 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 50314

start-costs: MEAN: 2.7247e+07 (2.72237e+07-2.72806e+07)  SIGMA:21015.1  
  end-costs: MEAN: 2.549e+07 (2.54763e+07-2.55024e+07)  SIGMA:8884.53  
   start-pp: MEAN: 764.968 (755.625-778.521)  SIGMA:8.4718  
     end-pp: MEAN: 304.006 (301.83-305.993)  SIGMA:1.41786  
 iterations: MEAN: 1.69769e+06 (1.46804e+06-1.93743e+06)  SIGMA:159139  
       time: MEAN: 179.641 (159.26-204.94)  SIGMA:15.4582  
/home/dnm/moses-wksp/bin/mkcls -c50 -n10 -pcorpus.f -Vvoc2class.f opt

***** 10 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 55616

start-costs: MEAN: 2.8468e+07 (2.84367e+07-2.85185e+07)  SIGMA:24161.7  
  end-costs: MEAN: 2.64529e+07 (2.64329e+07-2.64902e+07)  SIGMA:16049.8  
   start-pp: MEAN: 921.688 (907.187-945.386)  SIGMA:11.2709  
     end-pp: MEAN: 333.694 (330.328-340.019)  SIGMA:2.71236  
 iterations: MEAN: 1.63988e+06 (1.51519e+06-1.98025e+06)  SIGMA:161266  
       time: MEAN: 175.962 (162.58-211.73)  SIGMA:17.5707  
/home/dnm/cdec/word-aligner/support/generate_word_pair_features.pl corpus.f-e corpus.f-e.full-model1 corpus.e-f.full-model1 orthonorm-dict.f orthonorm-dict.e voc2class.e voc2class.f corpus.f-e.model1 | /usr/bin/gzip -9 > wordpairs.f-e.features.gz
/bin/sh: /usr/bin/gzip: not found
Reading classes from voc2class.e...
Reading classes from voc2class.f...
Reading model1...
Reading inverse model1...
Reading sparse model 1 from corpus.f-e.model1...
Extracting word pair features...
make[1]: *** [wordpairs.f-e.features.gz] Error 127
make[1]: Leaving directory `/home/dnm/cdec/talign/grammars'

make: *** [all] Error 1
dnm@frutus:~/cdec/talign$ make
Making grammars ...
make[1]: Entering directory `/home/dnm/cdec/talign/grammars'
/home/dnm/cdec/word-aligner/support/classify.pl voc2class.e corpus.e > corpus.class.e
Loaded classes for 50313 words
/home/dnm/cdec/word-aligner/support/classify.pl voc2class.f corpus.f > corpus.class.f
Loaded classes for 55615 words
/home/dnm/cdec/word-aligner/stemmers/ur.pl < corpus.f > corpus.stemmed.f
/home/dnm/cdec/word-aligner/stemmers/ur.pl --vocab < f.voc > fstem.map
/home/dnm/cdec/word-aligner/stemmers/en.pl < corpus.e > corpus.stemmed.e
/home/dnm/cdec/word-aligner/stemmers/en.pl --vocab < e.voc > estem.map
make[1]: Leaving directory `/home/dnm/cdec/talign/grammars'

Making model-f-e ...
make[1]: Entering directory `/home/dnm/cdec/talign/model-f-e'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory `/home/dnm/cdec/talign/model-f-e'
make: *** [all] Error 1
$ make
Making grammars ...
make[1]: Entering directory `/home/dnm/cdec/talign/grammars'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/dnm/cdec/talign/grammars'

Making model-f-e ...
make[1]: Entering directory `/home/dnm/cdec/talign/model-f-e'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory `/home/dnm/cdec/talign/model-f-e'
make: *** [all] Error 1
-------------------------------------------------------------------------------------------------------

Strange how it succeeds (or seems to) when I re-run make. (Until the end, that is.)

--D.N.

Jonathan Clark

unread,
Jul 6, 2012, 3:23:29 PM7/6/12
to cdec-...@googlegroups.com
Hi Dennis,

Looks like you're missing gzip: "/bin/sh: /usr/bin/gzip: not found"

You'll need to install that using whatever package manager you have on your system.

Cheers,
Jon

Dennis Mehay

unread,
Jul 6, 2012, 3:29:17 PM7/6/12
to cdec-...@googlegroups.com
Yes. I just noticed that after I posted the error trace (sorry but my eyes are nearly bleeding from all of the screen gazing I've been doing lately). 

I do have gzip, but it's under /bin/gzip.  Would it make sense to replace that hard-coded reference with a lookup of where it is? (For future users, that is.)

Thanks,
D.N.

Jonathan Clark

unread,
Jul 6, 2012, 3:37:13 PM7/6/12
to cdec-...@googlegroups.com
Replacing /usr/bin/gzip with gzip is probably a reasonable solution, unless Chris has any objections.

Pull requests are usually well-received. ;-)

Chris Dyer

unread,
Jul 6, 2012, 5:06:35 PM7/6/12
to cdec-...@googlegroups.com
Hi Dennis,
Sorry you're having trouble. The aligner has been somewhat fallow lately, and it was never widely tested on different platforms. I'm happy to work with you to get things running.

For now, replacing gzip with a relative path makes a lot of sense.

-Chris

Dennis Mehay

unread,
Jul 6, 2012, 5:28:57 PM7/6/12
to cdec-...@googlegroups.com
Hi Chris,

Thanks for the offer.  I manually changed the makefile to point to /bin/gzip and re-ran the whole alignment pipeline.  It ran fine up until the point where I was getting stuck before, and then stopped:

-----------------------------------------------------------------------------------
[truncated for your reading pleasure]
...


***** 10 runs. (algorithm:TA)*****
;KategProblem:cats: 50   words: 55616

start-costs: MEAN: 2.8468e+07 (2.84367e+07-2.85185e+07)  SIGMA:24161.7  
  end-costs: MEAN: 2.64529e+07 (2.64329e+07-2.64902e+07)  SIGMA:16049.8  
   start-pp: MEAN: 921.688 (907.187-945.386)  SIGMA:11.2709  
     end-pp: MEAN: 333.694 (330.328-340.019)  SIGMA:2.71236  
 iterations: MEAN: 1.63988e+06 (1.51519e+06-1.98025e+06)  SIGMA:161266  
       time: MEAN: 161.81 (139.11-218.31)  SIGMA:29.4287  
/home/dnm/cdec/word-aligner/support/generate_word_pair_features.pl corpus.f-e corpus.f-e.full-model1 corpus.e-f.full-model1 orthonorm-dict.f orthonorm-dict.e voc2class.e voc2class.f corpus.f-e.model1 | /bin/gzip -9 > wordpairs.f-e.features.gz

Reading classes from voc2class.e...
Reading classes from voc2class.f...
Reading model1...
Reading inverse model1...
Reading sparse model 1 from corpus.f-e.model1...
Extracting word pair features...
/home/dnm/cdec/word-aligner/support/classify.pl voc2class.e corpus.e > corpus.class.e
Loaded classes for 50313 words
/home/dnm/cdec/word-aligner/support/classify.pl voc2class.f corpus.f > corpus.class.f
Loaded classes for 55615 words
/home/dnm/cdec/word-aligner/stemmers/ur.pl < corpus.f > corpus.stemmed.f
/home/dnm/cdec/word-aligner/stemmers/ur.pl --vocab < f.voc > fstem.map
/home/dnm/cdec/word-aligner/stemmers/en.pl < corpus.e > corpus.stemmed.e
/home/dnm/cdec/word-aligner/stemmers/en.pl --vocab < e.voc > estem.map
make[1]: Leaving directory `/home/dnm/cdec/talign/grammars'
Making model-f-e ...
make[1]: Entering directory `/home/dnm/cdec/talign/model-f-e'
make[1]: *** No targets specified and no makefile found.  Stop.
make[1]: Leaving directory `/home/dnm/cdec/talign/model-f-e'
make: *** [all] Error 1

-----------------------------------------------------------------------------------

Any tips on what might be going wrong?  Was it supposed to write a makefile to that directory, as it did for the 'grammars' directory?

--D.N.

dnm

unread,
Jul 6, 2012, 8:27:53 PM7/6/12
to cdec-...@googlegroups.com
OK. Sorry for the confusion, but apparently you need the word-alignment branch of the cdec repository, which has the cluster-ptrain.pl file (the latest of the master branch does not have this).  This is why the 'make' was failing at that point (see last post).  I'm still having issues, but they have to do with other things that I'm trying to pin down.

--D.N.

dnm

unread,
Jul 6, 2012, 8:30:37 PM7/6/12
to cdec-...@googlegroups.com
Somehow, this thing let me delete the entire post history. That's no good for those who want to learn from my meanderings.  I'll see if I can get the posts that led up to this point. (Or maybe the admin of this group has the ability to do that?)

--D.N.
Reply all
Reply to author
Forward
0 new messages