suggestion: specify the content word pair

2 views
Skip to first unread message

A

unread,
Mar 10, 2010, 11:30:13 PM3/10/10
to semeval-pete
Hello Deniz,

You have proposed an interesting task.
I have a suggestion that may help with some of the awkward <h>
examples you are discussing here.
The <h> examples in your proposal paper were clearer than the examples
in your trial data set because they provided two entailments for each
input sentence. This effectively indicated which two content words
were relevant in the entailment decisions.

Why not specify overtly which two content words are relevant to the
entailment test in your trial and test sets? Then you can use nominals
that are more natural than dummies (with reference, number, and
determiners that match the <t>). You can also include other
complements that may be non-optional. You don't need to insert any
dummy nouns. This creates entailments with much clearer judgements for
me. I give 5 examples to illustrate and encode the proposed content
word information, where possible, in <pair ...>.

For example 1, you can use a plural subject, which allows the verb to
keep its form. You can keep a reduced prepositional phrase to keep the
"bear resemblance to" idiom in tact.

replace:
<pair id="2007.N" entailment="YES">
<t>And many in the young cast bear striking resemblances to
American TV and movie personalities known for light roles.</t>
<h>Somebody bears the resemblances.</h>
</pair>
with:
<pair id="2007.N" entailment="YES" e1=bear e2=resemblances>
<t>And many in the young cast bear striking resemblances to
American TV and movie personalities known for light roles.</t>
<h>Many bear resemblances to movie personalities.</h>
</pair>

For example 2, there are several content words, even with the use of
"Somebody" as subject in the <h>, so it is not clear which pair of
content words is relevant in the entailment. You are probably not
interested in whether it succeeded at company-knows and knows-best. I
can't tell what the intended pair is here.

<pair id="1032.N" entailment="NO">
<t>Mr. Hahn attributes the gains to the philosophy of concentrating
on what a company knows best .</t>
<h>Somebody philosophizes what a company knows best.</h>
</pair>

For example 3, some may judge <h> as a "NO" or "NOT SURE" entailment
because "somebody" implies an individual person, but the subject in
<t> is a country. You can leave the country in if you specify the
content words of interest.

replace:
<pair id="4116" entailment="YES">
<t>The U.S. wants the removal of what it perceives as barriers to
investment ; Japan denies there are real barriers .</t>
<h>Somebody denies there are barriers.</h>
</pair>
with:
<pair id="4116" entailment="YES" e1=denies e2=barriers>
<t>The U.S. wants the removal of what it perceives as barriers to
investment ; Japan denies there are real barriers .</t>
<h>Japan denies there are barriers.</h>
</pair>

For example 4, retain the "some" determiner which is not equivalent to
"the" and a grammatical "that" clause.
replace:
<pair id="4026.N" entailment="YES">
<t>After the first set of meetings two months ago , some U.S.
officials complained that Japan had n't come up with specific changes
it was prepared to make .</t>
<h>The officials complained something.</h>
</pair>
with:
<pair id="4026.N" entailment="YES" e1=officials e2=complained>
<t>After the first set of meetings two months ago , some U.S.
officials complained that Japan had n't come up with specific changes
it was prepared to make .</t>
<h>Some officials complained that Japan hadn't come up with specific
changes.</h>
</pair>

For example 5, I am not sure which content words are the relevant
pair. The <h> could be: <h>It is high up on my dress.</h>
<pair id="3096" entailment="YES">
<t>`` I reached into that funny little pocket that is high up on my
dress .</t>
<h>Something is high up on something.</h>
</pair>

If you clarify the <h> candidates this way, I expect you would get
fewer "NOT SURE" answers. I think you could then reject these examples
instead of counting them as "NO" examples.

Best regards, AM

Deniz Yuret

unread,
Mar 12, 2010, 5:38:13 AM3/12/10
to semeval-pete, A
Hello Anna,

Thank you for your careful analysis and suggestion. I think the
content word pair should be recorded during entailment generation and
used for analysis. I am not sure it is a good idea to make it
available during testing. The simple instructions given to untrained
annotators did not include the content word pair, so I think programs
which try to replicate their competence should do so without this
extra information as well. I believe some of the awkward entailments
will get fixed in version 3 and the intended word pair will be more
clear in most cases.

deniz

Reply all
Reply to author
Forward
0 new messages