Re: [joshua-support] Re: Getting error in tuning

21 views
Skip to first unread message

Matt Post

unread,
Feb 22, 2013, 9:28:17 AM2/22/13
to joshua_...@googlegroups.com, bibek....@gmail.com
Great. I'll add a note to add filtering of empty lines from the training data.


On Feb 22, 2013, at 2:49 AM, bibek....@gmail.com wrote:

I solved it was there was a newline in the training corpus.

On Friday, 22 February 2013 11:31:25 UTC+5:30, bibek....@gmail.com wrote:
Using grammar read from file data/tune/grammar.filtered.gz
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:694)
at joshua.util.FormatUtils.isNonterminal(FormatUtils.java:39)
at joshua.corpus.Vocabulary.nt(Vocabulary.java:236)
at joshua.corpus.Vocabulary.id(Vocabulary.java:163)
at joshua.decoder.ff.tm.format.HieroFormatReader.parseLine(HieroFormatReader.java:56)
at joshua.decoder.ff.tm.format.HieroFormatReader.parseLine(HieroFormatReader.java:10)
at joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:110)
at joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:17)
at joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.<init>(MemoryBasedBatchGrammar.java:112)
at joshua.decoder.JoshuaDecoder.initializeMainTranslationGrammar(JoshuaDecoder.java:379)
at joshua.decoder.JoshuaDecoder.initialize(JoshuaDecoder.java:267)
at joshua.decoder.JoshuaDecoder.<init>(JoshuaDecoder.java:90)
at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:571)

--
You received this message because you are subscribed to the Google Groups "Joshua Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_suppor...@googlegroups.com.
To post to this group, send email to joshua_...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_support?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Matt Post

unread,
Mar 5, 2013, 2:33:36 PM3/5/13
to joshua_...@googlegroups.com, bibek....@gmail.com
I'm really not sure why this is happening. A laborious solution would be to do a binary search on the corpus you're learning the grammar on, to try to find the sentence causing the problem.

A hack that might would would be to grep -v "|||\s+|||" from the resulting grammar.

matt


On Mar 2, 2013, at 2:38 PM, bibek....@gmail.com wrote:

I am still getting error despite removing blank lines.
I am getting an  empty rule like |||   |||    |||      ..


On Friday, 22 February 2013 11:31:25 UTC+5:30, bibek....@gmail.com wrote:
Using grammar read from file data/tune/grammar.filtered.gz
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:694)
at joshua.util.FormatUtils.isNonterminal(FormatUtils.java:39)
at joshua.corpus.Vocabulary.nt(Vocabulary.java:236)
at joshua.corpus.Vocabulary.id(Vocabulary.java:163)
at joshua.decoder.ff.tm.format.HieroFormatReader.parseLine(HieroFormatReader.java:56)
at joshua.decoder.ff.tm.format.HieroFormatReader.parseLine(HieroFormatReader.java:10)
at joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:110)
at joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:17)
at joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.<init>(MemoryBasedBatchGrammar.java:112)
at joshua.decoder.JoshuaDecoder.initializeMainTranslationGrammar(JoshuaDecoder.java:379)
at joshua.decoder.JoshuaDecoder.initialize(JoshuaDecoder.java:267)
at joshua.decoder.JoshuaDecoder.<init>(JoshuaDecoder.java:90)
at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:571)
Reply all
Reply to author
Forward
0 new messages