Thanks for your reply. I have another question about the scfg rule format. The rule I got from thrax is like the following:
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] , scientists [X,2] ||| Lex(e|f)=1.41634 Lex(f|e)=13.76438 PhrasePenalty=2.71800 p(e|f)=5.10958 p(f|e)=2.96527
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] dissident scientist [X,2] ||| Lex(e|f)=2.06032 Lex(f|e)=2.40940 PhrasePenalty=2.71800 p(e|f)=6.02587 p(f|e)=-0.00000
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] [X,2] of scientists ||| Lex(e|f)=0.73386 Lex(f|e)=1.32741 PhrasePenalty=2.71800 p(e|f)=5.62040 p(f|e)=1.29928
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] science [X,2] ||| Lex(e|f)=4.57600 Lex(f|e)=6.91200 PhrasePenalty=2.71800 p(e|f)=5.33272 p(f|e)=7.26525
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] scientist [X,2] ||| Lex(e|f)=2.06032 Lex(f|e)=0.72300 PhrasePenalty=2.71800 p(e|f)=2.03688 p(f|e)=0.40547
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] scientists to [X,2] ||| Lex(e|f)=0.73386 Lex(f|e)=2.20514 PhrasePenalty=2.71800 p(e|f)=3.13549 p(f|e)=0.7472
I noticed in the cdec sample, the rule format is
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] of researchers , [X,2] ||| -2.47712135315 -1.83505606651 -4.26377058029 2.71828182845905
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] [X,2] by a scientist ||| -2.47712135315 -0.178537979722 -4.84308099747 2.71828182845905
[X] ||| [X,1] 科学家 [X,2] ||| [X,1] [X,2] of scientists ||| -2.47712135315 -0.294755220413 -1.10121524334 2.71828182845905
'
I just delete the words of 'Lex(e|f)=' ,'Lex(f|e)=' and so on, and put the column 'PhrasePenalty' to the end, then fed the rule file to dpmert with the following configuration
--------------------cdec.ini-------------------------------
cubepruning_pop_limit=30
density_prune=100
scfg_max_span_limit=15
feature_function=KLanguageModel xin_eng_fbis.tok.lower.order5.srilm.kenlm
feature_function=WordPenalty
add_pass_through_rules=true
grammar=final_rules
formalism=scfg
---------------the initial weights file----------------
WordPenalty 0.2
PassThrough 0.2
LanguageModel 0.2
LanguageModel_OOV 0.2
PhraseModel_0 0.1
PhraseModel_1 0.1
PhraseModel_2 0.1
The mert tunning goes 9 iterations. However, the translation result is very bad. Do you have any idea about what the problem could be ?
在 2012年8月7日星期二UTC+8下午12时38分03秒,Chris Dyer写道: