Quick start problems with Joshua 6.0.2 - where is thrax.jar?

49 views
Skip to first unread message

Sean Flanigan

unread,
May 12, 2015, 1:22:28 AM5/12/15
to joshua_...@googlegroups.com
Hi,

I am trying to follow the quick start guide, but I think it might need an update.  I am using Joshua 6.0.2, compiled from source on Fedora 20.

The quickstart guide says the "quickest way to use Joshua is to download a pre-built model and use them to start translating data", but it doesn't actually say how to do that, so it would be great to have that documented.

I kept reading ("building your own models") so that I could follow the quickstart example commands. 

(By the way, I've tried to correct a couple of minor things as I go along: https://github.com/joshua-decoder/joshua-decoder.github.com/pull/1)

With my adaptations, this was working, up until I got this error message from `pipeline.pl`:

[glue-tune] rebuilding...
 dep=/home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.filtered.gz [CHANGED]
 dep=/home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.glue [NOT FOUND]
 cmd=java -Xmx2g -cp /home/sflaniga/src/joshua-v6.0.2/lib/*:/home/sflaniga/src/joshua-v6.0.2/thrax/bin/thrax.jar edu.jhu.thrax.util.CreateGlueGrammar /home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.packed > /home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.glue
 JOB FAILED (return code 1)
Error: Could not find or load main class edu.jhu.thrax.util.CreateGlueGrammar

Following the instructions, I don't end up with a thrax directory, let alone thrax/bin/thrax.jar.  How should I get it?

Regards,

Sean.

Matt Post

unread,
May 12, 2015, 9:51:32 AM5/12/15
to joshua_...@googlegroups.com
Hi, 

I pulled in your changes, thanks.

Can you try running the command separately? I have checked with a fresh 6.0.2 installed and this works. I suspect there is a path problem to thrax.jar. Does the command

java -Xmx2g -cp /home/sflaniga/src/joshua-v6.0.2/lib/*:/home/sflaniga/src/joshua-v6.0.2/thrax/bin/thrax.jar edu.jhu.thrax.util.CreateGlueGrammar /home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.packed > /home/sflaniga/NotBackedUp/src/joshua-v6.0.2/data/tune/grammar.glue

work? If not, does the path

.home/sflaniga/src/joshua-v6.0.2/thrax/bin/thrax.jar

exist? It is included with the official releases. Perhaps it got deleted? I suggest redownloading, unpacking, and building with "ant".

matt


--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_support.
For more options, visit https://groups.google.com/d/optout.

Sean Flanigan

unread,
May 13, 2015, 2:20:04 AM5/13/15
to joshua_...@googlegroups.com
That's it, the whole thrax directory was somehow missing, but I grabbed it from the tarball.  Now pipeline.pl has successfully run thrax and mert is working away, thanks!

Is there any documentation on how to use the pre-built models?

Sean Flanigan

unread,
May 13, 2015, 2:33:33 AM5/13/15
to joshua_support
I discovered the README inside language-pack-es-en-phrase-2015-03-06.tgz (I had been going by the web pages only), so I tried this command:

    echo "Hello world." | ./prepare.sh | ./run-joshua.sh

But got this error:

FATAL: Using a packed grammar for a phrase table backend requires that you
       packed the grammar with Joshua 6.0.2 or greater

Are there any pre-built language packs for Joshua 6.0.2?

Sean Flanigan

unread,
May 13, 2015, 3:27:42 AM5/13/15
to joshua_support
I tried downgrading to Joshua 6.0.1 so that I could use the prebuilt language pack, and the build seems to be successful, including kenlm, but when I run joshua:

    echo "Hello world." | ./prepare.sh | ./run-joshua.sh

I get this error (Not "no ken in java.library.path"):

util/file.cc:68 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'.
No such file or directory while opening lm.kenlm2

The file util/file.cc seems to be part of kenlm, so I think kenlm must have loaded.

Can you suggest anything?

Matt Post

unread,
May 14, 2015, 10:21:15 AM5/14/15
to joshua_...@googlegroups.com
Hi, 

You want to use Joshua 6.0.2. The quick fix for this problem is to create a file

language-pack-es-en-phrase-2015-03-06/phrase-table.packed/config

with the contents:

max-source-len = 5

I have updated the language pack to have this file. Sorry for the trouble!

matt

Matt Post

unread,
May 14, 2015, 10:23:06 AM5/14/15
to joshua_...@googlegroups.com
(You should use 6.0.2, but this happened because KenLM didn't get recompiled, for some reason. "ant kenlm" should fix it. Also, it looks like your config file points to the file "lm.kenlm2", but the language pack includes "lm.kenlm", so it seems that you have perhaps inserted accidentally a stray "2". But use 6.0.2).

matt

Sean Flanigan

unread,
May 14, 2015, 10:05:53 PM5/14/15
to joshua_support
Thanks, I found:

feature-function = StateMinimizingLanguageModel -lm_type kenlm
-lm_order 5 -lm_file lm.kenlm2

in joshua.config. But the 2 actually came from the
language-pack-es-en-phrase-2015-03-06.tgz I downloaded a few days ago.
I haven't downloaded the revised tarball, but your two suggested fixes
did the trick.

And so I got this to work:

$ echo "Hola Mundo." | ./prepare.sh | ./run-joshua.sh
hello world .

Thanks!

I assume the pre-built model can't be used to translate the other way
(en to es)?


Back on the actual Quick Start guide, it seems to stop short of
actually using the trained and tested model to translate arbitrary
text. Once I have built the bn-en model (using pipeline.pl), how do I
use it with Joshua like the es-en model (eg run-joshua.sh)?

One more suggestion: I think the Quick Start should follow the
guideline in Pipeline's documentation, which says "Don’t run the
pipeline directly from $JOSHUA." By the way, that paragraph in
Pipeline's documentation stops mid-sentence on a bit of a cliffhanger:
"support scripts... that only work if you -".
http://joshua-decoder.org/6.0/pipeline.html

Cheers,

Matt Post

unread,
May 15, 2015, 9:11:06 AM5/15/15
to joshua_...@googlegroups.com

> On May 14, 2015, at 10:05 PM, Sean Flanigan <sean.f...@gmail.com> wrote:
>
> Thanks, I found:
>
> feature-function = StateMinimizingLanguageModel -lm_type kenlm
> -lm_order 5 -lm_file lm.kenlm2
>
> in joshua.config. But the 2 actually came from the
> language-pack-es-en-phrase-2015-03-06.tgz I downloaded a few days ago.
> I haven't downloaded the revised tarball, but your two suggested fixes
> did the trick.

Hmm, you're right. This was my mistake. I've corrected it in the language pack's config file.


> And so I got this to work:
>
> $ echo "Hola Mundo." | ./prepare.sh | ./run-joshua.sh
> hello world .
>
> Thanks!
>
> I assume the pre-built model can't be used to translate the other way
> (en to es)?

Correct.


> Back on the actual Quick Start guide, it seems to stop short of
> actually using the trained and tested model to translate arbitrary
> text. Once I have built the bn-en model (using pipeline.pl), how do I
> use it with Joshua like the es-en model (eg run-joshua.sh)?

The best way is to use the run_bundler.py script in $JOSHUA/scripts/support. Run it for example usage. This takes the output of the pipeline, along with the unfiltered grammar, and bundles it up into something that can be used to decode anything. In very short order (Joshua 6.0.3), there will be many improvements to the bundler. I will also add this to the documentation.


> One more suggestion: I think the Quick Start should follow the
> guideline in Pipeline's documentation, which says "Don’t run the
> pipeline directly from $JOSHUA." By the way, that paragraph in
> Pipeline's documentation stops mid-sentence on a bit of a cliffhanger:
> "support scripts... that only work if you -".
> http://joshua-decoder.org/6.0/pipeline.html

Noted, I'll take a look at this. Thanks for your feedback!

matt

Sean Flanigan

unread,
May 18, 2015, 3:51:23 AM5/18/15
to joshua_support
On 15 May 2015 at 23:11, Matt Post <po...@cs.jhu.edu> wrote:
>
>> On May 14, 2015, at 10:05 PM, Sean Flanigan <sean.f...@gmail.com> wrote:

>> Back on the actual Quick Start guide, it seems to stop short of
>> actually using the trained and tested model to translate arbitrary
>> text. Once I have built the bn-en model (using pipeline.pl), how do I
>> use it with Joshua like the es-en model (eg run-joshua.sh)?
>
> The best way is to use the run_bundler.py script in $JOSHUA/scripts/support. Run it for example usage. This takes the output of the pipeline, along with the unfiltered grammar, and bundles it up into something that can be used to decode anything. In very short order (Joshua 6.0.3), there will be many improvements to the bundler. I will also add this to the documentation.

Unfortunately, the example usage doesn't explain enough if you're not
quite sure what the Quick Start steps are doing (what's the origin
directory? what COPY_CONFIG_OPTIONS should I use?) I used the only
"joshua.config.final" file I could find, blindly copied the example
copy-config-options and tried all the possible origin directories I
could think of, but kept getting errors like "No such file or
directory: '<GRAMMAR_ARGS>'".

Given my current lack of knowledge, I should probably just wait for
6.0.3 and the revised Quick Start (I hope it will cover
run_bundler.py). My installation is probably a mess from running
pipeline in $JOSHUA anyway.

Regards,

Sean.

Matt Post

unread,
May 18, 2015, 1:13:29 PM5/18/15
to joshua_...@googlegroups.com
Hi Sean,

I started a long response and then just went and updated the documentation for the run bundler. I hope that answers some of your questions. Joshua 6.0.3 will be out this week.

http://joshua-decoder.org/6.0/bundle.html

matt

Sean Flanigan

unread,
May 18, 2015, 9:02:26 PM5/18/15
to joshua_support
Thanks Matt, that does make things clearer.

(From the sentence at the top, this doesn't apply to Joshua 6.0.2, so
I haven't tried it yet.)

That page mentions "The --tm line... takes two arguments" but that is
the only instance of "--tm" on the page. If it's a reference to the
--pack-tm line, it only seems to have one argument in the example.
Also, what is meant by "the TM's owner"? Is that the parent directory
of something?

According to this page, a language pack is a trained+tuned model, and
the translation model is sometimes called the grammar, which implies
that "language pack" == (tuned) "translation model" == "grammar". I
think there must really be some distinction, but could you please add
something to explain what it is? It seems more like the grammar is (a
large) part of the TM, and perhaps a translation model is (a large)
part of the language pack (but using a packed grammar).

Matt Post

unread,
May 19, 2015, 11:18:11 AM5/19/15
to joshua_...@googlegroups.com
Hi Sean,

I fixed up the language and pushed the changes. The "owner" information is no longer needed and shouldn't have been in there (for your information, it is used to link a TM to its set of dense weights in the config file, which are named according to the pattern tm_OWNER_INDEX).

The translation model (or phrase table) is the set of phrase mappings that tell you, for example, that "quiero" might translate as "want" or "I want" (it contains millions of such entries). A type of translation called hierarchical translation uses a grammar instead of a phrase table, which encodes the same information. So, for all practical purposes,

translation model (TM) = dictionary of phrase translations
phrase table = TM for phrase-based decoding
grammar = TM for hierarchical / syntax-based decoding

All of the language packs I've published so far are phrased-based models, but Joshua does both (and I will publish a hierarchical Chinese model sometime soon).

That terminology aside, the TM is only one part of a tuned language pack. Tuning involves setting the weights on the linear model, that tell the decoder how much to trust the ten or so component submodels, which usually includes four or five weights for the TM, a weight for the language model, and a bunch of other weights.

I hope that is all helpful.

// matt

Matt Post

unread,
May 19, 2015, 1:02:01 PM5/19/15
to joshua_...@googlegroups.com
> Back on the actual Quick Start guide, it seems to stop short of
> actually using the trained and tested model to translate arbitrary
> text. Once I have built the bn-en model (using pipeline.pl), how do I
> use it with Joshua like the es-en model (eg run-joshua.sh)?

Yes, you're right, the language-pack bundling software isn't really used by the pipeline. It would be more efficient if it did use it, so I'm working on integrating them. In the meantime, I just updated the Quick Start guide to indicate how to bundle the text.

Thanks a ton for all your detailed feedback.

matt
Reply all
Reply to author
Forward
0 new messages