I'm trying to use NLTK's IBM models to build a translation system, which I'm applying to "translate" English orthography into ARPABET (and ultimately IPA).
Everything seems to train just fine, but when I try to extract the alignments I get an error:
C:\Users\Zelda>python ibm.py alexandrascmu.txt
['a', 'b', 'a', 'c', 'u', 's']
['AE', 'B', 'AH0', 'K', 'AH0', 'S']
Traceback (most recent call last):
File "ibm.py", line 47, in <module>
print test_sentence.alignment
File "C:\Users\Zelda\Anaconda2\lib\site-packages\nltk\compat.py", line 671, in wrapper
return method(self).encode('ascii', 'backslashreplace')
File "C:\Users\Zelda\Anaconda2\lib\site-packages\nltk\compat.py", line 659, in wrapper
return transliterate(method(self))
File "C:\Users\Zelda\Anaconda2\lib\site-packages\nltk\translate\api.py", line 239, in __str__
return " ".join("%d-%d" % p[:2] for p in sorted(self))
File "C:\Users\Zelda\Anaconda2\lib\site-packages\nltk\translate\api.py", line 239, in <genexpr>
return " ".join("%d-%d" % p[:2] for p in sorted(self))
TypeError: %d format: a number is required, not NoneType
When I check the models for probabilities, they do exist, but for whatever reason I'm not getting alignments.
Any ideas how to fix this?