But you don't need to do this if you use the pipeline script.
For the sentence compression data, you will want to email Juri (and perhaps his coauthors --- or perhaps he'll respond here) for a pointer to the data. Then you can use Joshua like this:
--source en1 -target en --type samt \
--first-step tune --last-step tune \
--grammar /path/to/PPDB --lmfile /path/to/your/LM \
--tune /path/to/tuning/set/prefix \
--no-filter-tm --joshua-mem 16GB
I think that's it, but there might be more, try that and see how it goes (here, the source language is set to en1, since these are appended to your tuning set prefix, so you can't have en for both)
When tuning is done, you'll have a model file which
you can pack. I can help you when you get there. That lets you then easily use Joshua as a black box.
matt