Hello to everyone,
I did a simple tutorial on how to build a word alignment using GIZA or
MGIZA.
http://fabioticconi.wordpress.com/2011/01/17/how-to-do-a-word-alignment-with-giza-or-mgiza-from-parallel-corpus/
There's a link to Els' raw europarl corpora for the 6 languages of
this task, a number of nonbreaking prefixes to use with europarl tools
(europarl maintainers don't provide prefixes for each language) to
separate only "bad" punctuation from words, and a little bash script
that do the alignment with source = it or de or nl or fr or es, and
target = en, using MGIZA and the config file I wrote for each
language. It can be very easily adapted to your needs.
I hope it'll be useful to someone.
All the best,
Fabio Ticconi