CUNY-NLP Talk by Nitin Madnani (Maryland), March 26

8 views
Skip to first unread message

Joseph Turian

unread,
Mar 18, 2010, 12:19:13 AM3/18/10
to ny-n...@googlegroups.com
We are pleased to host a practise thesis presentation by a rising MT star
Nitin Madnani. If you are interested in meeting with the speaker, please
email Heng.

Time: 1245pm-145pm, Friday, March 26
Place: Room 6496, CUNY Graduate Center, 365 Fifth Ave (34str&35str).
Speaker: Nitin Madnani (Maryland)
Title: The Circle of Meaning: From Translation to Paraphrasing and Back

Abstract:
The preservation of meaning between their inputs and outputs is perhaps
the most ambitious and, often, the most elusive goal of systems that
attempt to process natural language. Nowhere is this goal of more obvious
importance than for the tasks of machine translation and paraphrase
generation. Preserving meaning between the input and the output is
paramount for both, the monolingual vs bilingual distinction
notwithstanding. In this talk, I propose a novel, symbiotic connection
between these two tasks.

Today's SMT systems require high quality human translations, in addition
to large bitexts, for parameter tuning. For such tuning, it is generally
considered wise to have multiple (usually 4) reference translations to
avoid unfair penalization of translation hypotheses.  However, this
reliance on multiple reference translations creates a problem, because
reference translations are labor intensive and expensive to obtain.
Therefore, most current MT datasets only contain a single reference. This
leads to the problem of reference sparsity--- the primary open problem
that I attempt to address in this talk---one that has a serious effect on
the SMT parameter tuning process.

Bannard & Callison-Burch (2005) were the first to provide a practical
connection between phrase-based statistical machine translation techniques
paraphrase generation. However, their technique is restricted to
generating phrasal paraphrases. We build upon their approach and augment a
phrasal paraphrase extractor into a sentential paraphraser with extremely broad
coverage. The novelty in this augmentation lies in the further
strengthening of the connection between statistical machine translation
and paraphrase generation; whereas  Bannard and Callison-Burch only rely
on SMT machinery to extract phrasal paraphrase rules and stop there,
we take it a few steps further and build a full English-to-English SMT
system. This system can, as expected, "translate" any English input
sentence into a new English sentence with the same degree of meaning
preservation that exists in a bilingual SMT system. In  fact, being a
state-of-the-art SMT system, it is able to generate n-best "translations"
for any given input sentence. This sentential paraphraser, built almost
entirely from SMT machinery, represents the first 180 degrees of the
proposed circle of meaning.

To complete the circle, we propose a novel connection in the other
direction. We claim that the sentential paraphraser, once built in this
fashion, can provide a solution to the reference sparsity problem and,
hence, be used to improve the performance a bilingual SMT system. We posit
two different instantiations of the sentential paraphraser and show
results that provide empirical validation for this proposed connection.

Speaker Bio:
Nitin Madnani is a final year PhD student at the University of Maryland,
College Park. He works as a research assistant in the Laboratory for
Computational Linguistics and Information Processing with his advisors
Bonnie Dorr and Philip Resnik. Besides exploring the intersection of
and interaction between machine translation and paraphrasing as part of
his thesis, he has also worked on multi-document summarization and
information retrieval. He is planning to graduate in May 2010.

Reply all
Reply to author
Forward
0 new messages