Word alignment with GIZA or MGIZA

54 views
Skip to first unread message

Fabio Ticconi

unread,
Jan 20, 2011, 9:24:20 PM1/20/11
to SemEval2010_Cross-Lingual Word Sense Disambiguation
Hello to everyone,

I did a simple tutorial on how to build a word alignment using GIZA or
MGIZA.

http://fabioticconi.wordpress.com/2011/01/17/how-to-do-a-word-alignment-with-giza-or-mgiza-from-parallel-corpus/

There's a link to Els' raw europarl corpora for the 6 languages of
this task, a number of nonbreaking prefixes to use with europarl tools
(europarl maintainers don't provide prefixes for each language) to
separate only "bad" punctuation from words, and a little bash script
that do the alignment with source = it or de or nl or fr or es, and
target = en, using MGIZA and the config file I wrote for each
language. It can be very easily adapted to your needs.

I hope it'll be useful to someone.

All the best,
Fabio Ticconi

els lefever

unread,
Jan 21, 2011, 2:58:09 AM1/21/11
to cross-li...@googlegroups.com
Looks really useful, Fabio!
Thanks a lot.

Best,
Els.

> Date: Thu, 20 Jan 2011 18:24:20 -0800
> Subject: Word alignment with GIZA or MGIZA
> From: fabio....@gmail.com
> To: cross-li...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages