Infinite run in phrase training

35 views
Skip to first unread message

praveen dakwale

unread,
Jun 16, 2015, 10:05:25 AM6/16/15
to jane-...@googlegroups.com
Hi,

I am new to Jane, hence apologies if this query is repetitive or primitive. I am trying phrase training with Jane.

$bin/trainHierarchical.sh --config extract.config --phraseTraining --janeConfig jane.config

I am trying with a sample bitext of just 100 sentence. After the initial extraction, it goes infinitely printing 'Calling startPhraseTraining' without any visible progress. I guess for a corpus of 100 sentence it should complete quickly.

Its the same result, if I drop the jane.config and thus set default parameters (?).

Can anyone reply if I am calling it incorrectly. Whats the correct procedure/call for phrase training ?

Thanks and Regards
Praveen

Jörn Wübker

unread,
Jun 16, 2015, 10:13:33 AM6/16/15
to jane-...@googlegroups.com
Hi Praveen,

trainHierarchical.sh should first perform standard phrase extraction and generating a phrase table file. Has this part succeeded?
If you send me the contents of your extract.config and jane.config files and the output of an ls -l on your training directory after running the script, I can try to help you debug.

Do you use a sun grid engine?

Cheers,
Joern



Am 16.06.15 um 16:05 schrieb praveen dakwale:

--
You received this message because you are subscribed to the Google Groups "jane Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jane-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

praveen dakwale

unread,
Jun 17, 2015, 5:35:04 AM6/17/15
to jane-...@googlegroups.com
Hi Joern,

I guess the phrase extraction part is succeeded. But after that it goes in a infinite loop. Attached are the files. Out.log is the output of ls -l.

I am not using Sun grid system. Just a single linux server


Regards
Praveen
jane.config
out.log
out.log

praveen dakwale

unread,
Jun 17, 2015, 7:54:22 AM6/17/15
to jane-...@googlegroups.com
Apologies for missing extract.config in previous mail. Here its attached
extract.config

Joern Wuebker

unread,
Jun 17, 2015, 10:55:56 AM6/17/15
to jane-...@googlegroups.com, dakwale...@gmail.com
Dear Praveen,

turns out we never implemented the automation for single-machine phrase training. 
For a quick fix, please use the two scripts I attached to replace the ones in your Jane bin directory.

I also corrected some formatting errors and naming inconsistencies in your jane.config file and re-attached it.
Finally, please train a 2-gram language model on the target side of your data and update the filename in the jane.config.

This should allow your toy setup to run through. 
If you have an SGE at your disposal, you will want to use it for larger data. Otherwise training will be very slow.

If you have further questions, please don't hesitate to ask :-)

Best,
Joern
queuePhraseTraining.sh
trainHierarchical.sh
jane.config

praveen dakwale

unread,
Jun 18, 2015, 8:58:38 AM6/18/15
to jane-...@googlegroups.com, dakwale...@gmail.com
Hi Joern,

Thanks for the changes. I replaced and updated files as you described. But breaks on another point now in 'forcedAlignmentCollectRules.sh'. I am attaching the error log. where at the bottom it says
'arabic.100.filter.gz not found' which is the name of my filter source text as a provided in extract.config.

Moreover, if I run without providing a filter option (which is an optional parameter in config) it breaks at another point looking for arabic.100.gz.noParens.gz. I am also attaching the log for this as noFilter.err.log

Let me know if there you need other details.

Thanks and Regards
Praveen
err.log
noFilter.err.log

Jörn Wübker

unread,
Jun 19, 2015, 4:59:06 AM6/19/15
to jane-...@googlegroups.com
It looks like google groups web interface was wreaking havoc with the file format.
You must already have removed the carriage return characters from the two scripts. If you do the same with jane.config, it should work.
I have attached the three files again, hoping that my mail client will not interfere with the file format in the same way.

Best,
Joern

Am 18.06.15 um 14:58 schrieb praveen dakwale:

jane.config
queuePhraseTraining.sh
trainHierarchical.sh
Reply all
Reply to author
Forward
0 new messages