Word alignment with GIZA or MGIZA

54 views

Skip to first unread message

Fabio Ticconi

unread,

Jan 20, 2011, 9:24:20 PM1/20/11

to SemEval2010_Cross-Lingual Word Sense Disambiguation

Hello to everyone,

I did a simple tutorial on how to build a word alignment using GIZA or
MGIZA.

http://fabioticconi.wordpress.com/2011/01/17/how-to-do-a-word-alignment-with-giza-or-mgiza-from-parallel-corpus/

There's a link to Els' raw europarl corpora for the 6 languages of
this task, a number of nonbreaking prefixes to use with europarl tools
(europarl maintainers don't provide prefixes for each language) to
separate only "bad" punctuation from words, and a little bash script
that do the alignment with source = it or de or nl or fr or es, and
target = en, using MGIZA and the config file I wrote for each
language. It can be very easily adapted to your needs.

I hope it'll be useful to someone.

All the best,
Fabio Ticconi

els lefever

unread,

Jan 21, 2011, 2:58:09 AM1/21/11

to cross-li...@googlegroups.com

Looks really useful, Fabio!

Thanks a lot.

Best,

Els.

> Date: Thu, 20 Jan 2011 18:24:20 -0800
> Subject: Word alignment with GIZA or MGIZA
> From: fabio....@gmail.com
> To: cross-li...@googlegroups.com

Reply all

Reply to author

Forward

0 new messages