phrase-based translation training error

59 views
Skip to first unread message

陈徐希

unread,
Aug 1, 2012, 12:04:24 AM8/1/12
to cdec-...@googlegroups.com
Hi,
 
I tried to used cdec to do mert training of phrase-based translation, but got the following error:
 
 
Inside translator process hints
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
workgRSP.sh: line 2:  6952 Aborted
 
The website doesn't talk much about the phrase-based translation. I wonder whether it's because the configuraton error. The following are my configuration file
 
------cdec.ini----
cubepruning_pop_limit=30
feature_function=KLanguageModel /home/wangpd/machine-translation/NIST/data/language-models/xin_eng_fbis.tok.lower.order5.srilm.kenlm
feature_function=WordPenalty
add_pass_through_rules=true
grammar=/home/chenxx/cdec/workingspace/pb/phrase-table.cleaned.gz
formalism=pb
----weights------
WordPenalty 0.1
Number 0.1
PassThrough -0.2
LanguageModel 0.2
LanguageModel_OOV 0.2
PhraseModel_0 0.2
PhraseModel_1 0.2
PhraseModel_2 0.2
 
 
 

陈徐希

unread,
Aug 1, 2012, 3:29:59 AM8/1/12
to cdec-...@googlegroups.com
I found it is the problem of cdec program, which use up all the memory
 
 

在 2012年8月1日星期三UTC+8下午12时04分24秒,陈徐希写道:

chen

unread,
Aug 1, 2012, 4:19:28 AM8/1/12
to cdec-...@googlegroups.com
I wonder it is because I didn't set some stack pruning parameters.
 

在 2012年8月1日星期三UTC+8下午3时29分59秒,陈徐希写道:

Hieu Hoang

unread,
Aug 1, 2012, 8:07:42 AM8/1/12
to cdec-...@googlegroups.com, cdec-...@googlegroups.com
I don't know the details of cdec but it likely that you need to filter your grammar and binarize your LM 

Hieu
Sent from my flying horse

chen

unread,
Aug 1, 2012, 10:44:52 PM8/1/12
to cdec-...@googlegroups.com
Hi,
I have already done these.

在 2012年8月1日星期三UTC+8下午8时07分42秒,Hieu Hoang写道:

chen

unread,
Aug 2, 2012, 8:02:52 AM8/2/12
to cdec-...@googlegroups.com
Hi,
 
do you mean to delete inferior grammer(translation option in phrase-based translation) by saying "filter grammar"?  I look into the code. It is likely that the cdec would use all the transatation option, while in moses, there is a configuration to limit the number of translation option used.  

在 2012年8月1日星期三UTC+8下午8时07分42秒,Hieu Hoang写道:

Chris Dyer

unread,
Aug 2, 2012, 1:08:08 PM8/2/12
to cdec-...@googlegroups.com
Hey guys, sorry for the trouble. A couple things:
1) cdec's phrase based MT implementation is fairly experimental. The
moses implementation will be much, much faster and feature rich (for
example, there is no lexicalized reordering model).
2) cdec does not have a binary representation of the phrase table.
Typically, "per sentence grammars" or phrase tables are used, either
by filtering a full phrase table so that it matches only the rules.
Trying to load the full phrase table in memory will not typically be
feasible.
Reply all
Reply to author
Forward
0 new messages