Zeros versus scripts/mosesRules2Jane.py

16 views
Skip to first unread message

Kenneth Heafield

unread,
Jun 28, 2013, 9:03:23 AM6/28/13
to jane-...@googlegroups.com
Apparently Moses phrase table extraction can sometimes produce zero probabilities due to numerical precision issues. 

% im [X][X] auf [X] ||| % in [X][X] to [X] ||| 0 0.00521404 0 0.0844765 2.718 ||| 2-2 ||| 12.1811 12.4007

This causes Jane's conversion script to fail:

Traceback (most recent call last):
  File "scripts/mosesRules2Jane.py", line 36, in <module>
    janeScores.append(str(-math.log(float(s))))

Note that Moses internally with take the log of probabilities then floor them at -100.  So I suggest this modification:

  for s in scores:
    f = float(s)
    janeScores.append("-100" if f == 0.0 else str(-math.log(f)))

Jan

unread,
Jul 1, 2013, 3:41:29 AM7/1/13
to jane-...@googlegroups.com
Hi Ken,
0 should map to 100 since we are using -log(x) and we should then floor all the scores to 100:

for s in scores:
    f = float(s)
    f = 100.0 if f == 0.0 else -math.log(f)
    f = min(100,f)
    janeScores.append(str(f))

It's now fixed in our code base, thanks for pointing that out.
Reply all
Reply to author
Forward
0 new messages