Joshua pipeline error

87 views
Skip to first unread message

Mohamed EL MAROUANI

unread,
Dec 20, 2015, 3:16:16 PM12/20/15
to joshua_...@googlegroups.com
Hi,
I'm beginner with Joshua, and I try to run its pipeline for the first time.
I'm following the guideline in http://joshua-decoder.org/6.0/quick-start.html but after downloading models and executing the pipeline command, I get the following error:

elma@VBUbuntu:~/models/bn-en$ $JOSHUA/bin/pipeline.pl --source bn --target en \
>     --type hiero \
>     --no-prepare --aligner berkeley \
>     --corpus input/bn-en/tok/training.bn-en \
>     --tune input/bn-en/tok/dev.bn-en \
>     --test input/bn-en/tok/devtest.bn-en
[source-numlines] cached, skipping...
[source-numlines] retrieved cached result => 20788
[berkeley-aligner-chunk-0] cached, skipping...
[aligner-combine] cached, skipping...
[thrax-input-file] cached, skipping...
[thrax-prep] rebuilding...
  dep=/home/elma/models/bn-en/data/train/thrax-input-file 
  dep=grammar.gz [NOT FOUND]
  cmd=hadoop/bin/hadoop fs -rm -r pipeline-bn-en-hiero-_home_elma_models_bn-en; hadoop/bin/hadoop fs -mkdir pipeline-bn-en-hiero-_home_elma_models_bn-en; hadoop/bin/hadoop fs -put /home/elma/models/bn-en/data/train/thrax-input-file pipeline-bn-en-hiero-_home_elma_models_bn-en/input-file
  took 14 seconds (14s)
[thrax-run] rebuilding...
  dep=/home/elma/models/bn-en/data/train/thrax-input-file [CHANGED]
  dep=thrax-hiero.conf [CHANGED]
  dep=grammar.gz [NOT FOUND]
  cmd=hadoop/bin/hadoop jar /home/elma/ws/joshua-6.0.5/thrax/bin/thrax.jar -D mapred.child.java.opts='-Xmx2g' -D hadoop.tmp.dir=/tmp thrax-hiero.conf pipeline-bn-en-hiero-_home_elma_models_bn-en > thrax.log 2>&1; rm -f grammar grammar.gz; hadoop/bin/hadoop fs -getmerge pipeline-bn-en-hiero-_home_elma_models_bn-en/final/ grammar.gz
  took 140 seconds (2m20s)
[lm-sort-uniq] rebuilding...
  dep=/home/elma/models/bn-en/indian-parallel-corpora-1.0/bn-en/tok/training.bn-en.en [CHANGED]
  dep=/home/elma/models/bn-en/indian-parallel-corpora-1.0/bn-en/tok/training.bn-en.en.uniq [NOT FOUND]
  cmd=/home/elma/ws/joshua-6.0.5/scripts/training/scat /home/elma/models/bn-en/indian-parallel-corpora-1.0/bn-en/tok/training.bn-en.en | sort -u -T /tmp -S 2G | gzip -9n > /home/elma/models/bn-en/indian-parallel-corpora-1.0/bn-en/tok/training.bn-en.en.uniq
  took 0 seconds (0s)
* FATAL: /home/elma/ws/joshua-6.0.5/bin/lmplz (for building LMs) does not exist.
  This is often a problem with the boost libraries (particularly threaded
  versus unthreaded).

What's the relation with boost?

I notice also that lmplz doesn't exist in joshua folder:

elma@VBUbuntu:~/ws/joshua-6.0.5/bin$ ls
bleu  decoder  extract-1best  GIZA++  joshua-decoder  meteor  mkcls  pipeline.pl  snt2cooc.out

Any idea please to resolve this problem.

Thank you in advance.

Best regards,

--
Mohamed EL MAROUANI
PhD Student
Ibn Tofail University, Kenitra, Morocco

Matt Post

unread,
Dec 20, 2015, 3:18:06 PM12/20/15
to joshua_...@googlegroups.com
Hello,

It looks like KenLM didn't build. What system are you working on? 

You can try to build it directly by typing

cd $JOSHUA
ant kenlm

matt


--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at https://groups.google.com/group/joshua_support.
For more options, visit https://groups.google.com/d/optout.

Mohamed EL MAROUANI

unread,
Dec 20, 2015, 3:45:25 PM12/20/15
to joshua_...@googlegroups.com
Thank you Matt.

My system is: Ubuntu 14.04 (64bits)

I executed ant kenlm successfully (with some errors highlighted in yellow). The error persists and no lmplz folder was generated in joshua-6.0.5/bin.

elma@VBUbuntu:~/ws/joshua-6.0.5$ ant kenlm
Buildfile: /home/elma/ws/joshua-6.0.5/build.xml

check-joshua-home:
     [echo] JOSHUA = /home/elma/ws/joshua-6.0.5 basedir = /home/elma/ws/joshua-6.0.5

kenlm:
     [exec] -- Boost version: 1.54.0
     [exec] -- Found the following Boost libraries:
     [exec] --   program_options
     [exec] --   system
     [exec] --   thread
     [exec] --   unit_test_framework
     [exec] -- Configuring done
     [exec] -- Generating done
     [exec] -- Build files have been written to: /home/elma/ws/joshua-6.0.5/src/kenlm/build
     [exec] [ 35%] Built target kenlm_util
     [exec] Linking CXX executable ../bin/bit_packing_test
     [exec] CMakeFiles/kenlm_util.dir/read_compressed.cc.o: In function `util::(anonymous namespace)::StreamCompressed<util::(anonymous namespace)::GZip>::~StreamCompressed()':
     [exec] read_compressed.cc:(.text+0x219): undefined reference to `inflateEnd'
     [exec] CMakeFiles/kenlm_util.dir/read_compressed.cc.o: In function `util::(anonymous namespace)::StreamCompressed<util::(anonymous namespace)::GZip>::~StreamCompressed()':
     [exec] read_compressed.cc:(.text+0x439): undefined reference to `inflateEnd'
     [exec] CMakeFiles/kenlm_util.dir/read_compressed.cc.o: In function `util::(anonymous namespace)::ReadFactory(int, unsigned long&, void const*, unsigned long, bool)':
     [exec] read_compressed.cc:(.text+0xa49): undefined reference to `inflateInit2_'
     [exec] CMakeFiles/kenlm_util.dir/read_compressed.cc.o: In function `util::(anonymous namespace)::StreamCompressed<util::(anonymous namespace)::GZip>::Read(void*, unsigned long, util::ReadCompressed&)':
     [exec] read_compressed.cc:(.text+0xe8f): undefined reference to `inflate'
     [exec] collect2: error: ld returned 1 exit status
     [exec] make[2]: *** [bin/bit_packing_test] Error 1
     [exec] make[1]: *** [util/CMakeFiles/bit_packing_test.dir/all] Error 2
     [exec] make: *** [all] Error 2
     [exec] cp: cannot stat ‘bin/query’: No such file or directory
     [exec] cp: cannot stat ‘bin/lmplz’: No such file or directory
     [exec] cp: cannot stat ‘bin/build_binary’: No such file or directory
     [exec] g++: error: lm/CMakeFiles/kenlm.dir/*.o: No such file or directory
     [exec] Result: 1

BUILD SUCCESSFUL
Total time: 1 second


elma@VBUbuntu:~/models/bn-en$ $JOSHUA/bin/pipeline.pl --source bn --target en \
>     --type hiero \
>     --no-prepare --aligner berkeley \
>     --corpus input/bn-en/tok/training.bn-en \
>     --tune input/bn-en/tok/dev.bn-en \
>     --test input/bn-en/tok/devtest.bn-en
[source-numlines] cached, skipping...
[source-numlines] retrieved cached result => 20788
[berkeley-aligner-chunk-0] cached, skipping...
[aligner-combine] cached, skipping...
[lm-sort-uniq] cached, skipping...
* FATAL: /home/elma/ws/joshua-6.0.5/bin/lmplz (for building LMs) does not exist.
  This is often a problem with the boost libraries (particularly threaded
  versus unthreaded).


Best regards,

Mohamed,




Matt Post

unread,
Dec 20, 2015, 3:58:36 PM12/20/15
to joshua_...@googlegroups.com
Unfortunately, Joshua's version of KenLM has a number of problems building on Ubuntu, for reasons we have not been able to track down (I don't have access to an Ubuntu system).

I will look into this and see if I can figure something out....

matt

Mohamed EL MAROUANI

unread,
Dec 20, 2015, 4:29:15 PM12/20/15
to joshua_...@googlegroups.com
Can I use other other LM on Ubuntu like BereklyLM? or others (SRILM, IRSTLM)?
Adding --lm-gen argument is it sufficient to run pipeline process?
Thanks.
 

Hieu Hoang

unread,
Dec 20, 2015, 4:42:31 PM12/20/15
to joshua_...@googlegroups.com

At a guess you don't have the gzip library installed.
  sudo apt-get install zlib1g-dev

Mohamed EL MAROUANI

unread,
Dec 20, 2015, 4:49:33 PM12/20/15
to joshua_...@googlegroups.com
Hi Hieu,
The library gzip is already installed.

zlib1g-dev is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 242 not upgraded.

Best,

Matt Post

unread,
Dec 21, 2015, 3:36:41 PM12/21/15
to joshua_...@googlegroups.com
Yes, this isn't the problem. See this thread:


There seems to be something troubling about Ubuntu 14.

If you have a way of giving me access to an Ubuntu machine, I'll take a look at this again. It's annoying and prevalent enough that it would be nice to have fixed.

Another option is to avoid using KenLM. You can do this with the following args to the pipeline script:

--lm-gen berkeleylm --lm-type berkeleylm

It will then use Berkeley LM to generate language models and to load them when decoding.

matt

Mohamed EL MAROUANI

unread,
Dec 21, 2015, 8:05:47 PM12/21/15
to joshua_...@googlegroups.com
Hi,

I read the previous thread, and I realized that building Joshua with kenlm in Ubuntu is yet impossible.

Unfortunately, I dont work actually on a server for remote access. I build only on a VM in my laptop.

I'm trying to build using Berkeley LM with the given arguments, but the second argument is not recognized:

~/models/bn-en$ $JOSHUA/bin/pipeline.pl --source bn --target en \
>     --type hiero \
>     --no-prepare --aligner berkeley \
>     --lm-gen berkeleylm \
>     --lm-type berkeleylm \
>     --corpus input/bn-en/tok/training.bn-en \
>     --tune input/bn-en/tok/dev.bn-en \
>     --test input/bn-en/tok/devtest.bn-en
Unknown option: lm-type
Invalid usage, quitting

A question about a manner that seems not clean and not disciplined: Can I copy kenlm build files from another place (Moses installation for example)?

Thank you for your support!  
 


Matt Post

unread,
Dec 22, 2015, 12:57:42 PM12/22/15
to joshua_...@googlegroups.com
--lm, not --lm-type

matt

Mohamed EL MAROUANI

unread,
Dec 22, 2015, 5:51:29 PM12/22/15
to joshua_...@googlegroups.com
The pipeline command is executed now successfully.
Thank you Matt.

--
Mohamed
Reply all
Reply to author
Forward
0 new messages