Zeros versus scripts/mosesRules2Jane.py

16 views

Skip to first unread message

Kenneth Heafield

unread,

Jun 28, 2013, 9:03:23 AM6/28/13

to jane-...@googlegroups.com

Apparently Moses phrase table extraction can sometimes produce zero probabilities due to numerical precision issues.

% im [X][X] auf [X] ||| % in [X][X] to [X] ||| 0 0.00521404 0 0.0844765 2.718 ||| 2-2 ||| 12.1811 12.4007

This causes Jane's conversion script to fail:

Traceback (most recent call last):
File "scripts/mosesRules2Jane.py", line 36, in <module>
    janeScores.append(str(-math.log(float(s))))

Note that Moses internally with take the log of probabilities then floor them at -100. So I suggest this modification:

for s in scores:
    f = float(s)
    janeScores.append("-100" if f == 0.0 else str(-math.log(f)))

Jan

unread,

Jul 1, 2013, 3:41:29 AM7/1/13

to jane-...@googlegroups.com

Hi Ken,

0 should map to 100 since we are using -log(x) and we should then floor all the scores to 100:

for s in scores:

f = float(s)

f = 100.0 if f == 0.0 else -math.log(f)

f = min(100,f)

janeScores.append(str(f))

It's now fixed in our code base, thanks for pointing that out.

Reply all

Reply to author

Forward

0 new messages