Speaker: Christer Samuelsson
Title: Parametric Distributions for Statistical Machine Translation: What do Alignment Probabilities Actually Look Like?
Abstract:
The alignment and sentence length probabilities statistical machine translation have numerical domains and could potentially be modeled by parametric distributions. It turns out that sentence length probabilities
are well-described by automatically fitted Gaussian distributions, and alignment probabilities by reflected--crammed into a finite sentence---Cauchy distributions.
This indicates that there is less signal in them than commonly thought. From a practical perspective, parametric distributions are much more compact and robust. The extracted distribution parameters lent themselves
very well to linear regression, compressing the entire set of distributions to just two regression parameters each. This is much more effective data pooling than smoothing over neighboring contexts.