Re: PETE trial data

2 views
Skip to first unread message

Deniz Yuret

unread,
Mar 5, 2010, 5:17:36 AM3/5/10
to o...@ifi.uio.no, semeva...@googlegroups.com, AYDIN HAN, a...@cl.cam.ac.uk, da...@stanford.edu, ebe...@u.washington.edu, t...@ldwin.net, yzh...@coli.uni-sb.de, elisabe...@iln.uio.no, j...@ifi.uio.no
Dear Stephan,

Thank you for your interest and careful analysis of the PETE task. I
am cc'ing my response to the google group as I think these issues will
be of interest to other potential participants as well. My answers
follow:

On Fri, Mar 5, 2010 at 12:14 AM, Stephan Oepen <o...@ifi.uio.no> wrote:
> dear deniz (if i may),
>
> thank you for setting up the PETE shared task as part of SemEval 2010;
> i was co-organizer of a COLING workshop on cross-framework and cross-
> domain parser evaluation in 2008, and i believe what you are proposing
> with PETE has the potential to address a number of issues discussed at
> this event (see `http://lingo.stanford.edu/events/08/pe/' for details
> on the workshop goals and programme).

I am adding the workshop address to the pete website. Are the papers
available online as well?

> jointly with colleagues from the
> DELPH-IN network, we are currently trying to decide whether to prepare
> a submission to the PETE task (based on the English Resource Grammar).

Let me know if I can be of help in any way.

>
> from a first inspection of the trial data, my impression is that there
> are quite a number of ungrammatical utterances, for example:
>
>  The pretended did not exist.
>  Something would go he would hang.
>  The officials complained something.
>  Something had the be.
>  Something was he was tied up on something.
>  The constantly said something.
>  Something was the pegboard would go.
>  Somebody denies there are something.
>
> i am not a native speaker of the language, and some of the above may be
> subject to arguments about grammaticality.  however, both for the task
> of parser evaluation as well as for textual entailment, i believe there
> would be a point in focusing on data that is either attested in corpora
> (as are your texts, i reckon) or constructed data for which there is no
> uncertainty about grammaticality.  what is your take on this?

There are two possible reasons for ungrammatical hypothesis sentences:

1. The hypothesis sentences were generated to reflect parser
decisions. In the cases where a state of the art parser makes a
decision that differs from the gold set, I instructed the entailment
generators to try to generate two entailments - one for the gold
decision, and one for the parser decision. Sometimes this was not
possible. Sometimes it was, but it led to ungrammatical entailments.
2. The entailment generators made a mistake.

I can assure you that except for the cases where there was a human
error, the ungrammatical entailments you quote above were actually the
semantic consequence of following an actual parser decision, i.e. a
participant who builds a system with this parser would most likely
output "YES" for these hypotheses. However most human judges mark
such cases as "NO" or "Unsure", both of which I mapped to "NO" as the
official answer. Were there any "YES" gold answers for the
ungrammatical entailments you mentioned above?

To make the long story short, if your system finds an ungrammatical
entailment, I would advise it to output "NO".

> from our
> point of view, the examples above are problems: the ERG is designed to
> reject clearly ungrammatical inputs.  so in parsing with the ERG, it is
> a desirable property to _not_ accept such examples; a parser evaluation
> setup that might penalize parsers using precision grammars would appear
> unfortunate from out point of view.

In view of the above analysis, do you still think precision grammars
will get penalized if they output a "NO" answer for ungrammatical
input?

>
> second, and somewhat less fundamentally, the trial data is formatted in
> what appears potentially inconsistent.  texts are pre-tokenized, and in
> some cases contain PTB idiosyncrasies (parentheses showing as -LRB- and
> -RRB-), whereas hypotheses occur in conventional typography, i.e. with
> punctuation marks and contractions not separated from adjacent tokens.
> in the extreme, this asymmetry could require one to use distinct parser
> configurations, one with a standard tokenizer, another looking for the
> kind of partially pre-processed input that resembles PTB conventions.

Yes, you are correct. The "text" part usually comes from a well
tokenized treebank, where the "hypothesis" sentences were typed by
humans who were not as careful. In the test set I can try to
standardize the input to be either way.

>
> in my view, parser inputs should always be presented in `natural form',
> i.e. not pre-processed and reflecting standard typographic conventions.
> after all, this is the format a parser will need to handle in an actual
> application context.  furthermore, existing parsers actually differ in
> approaches to tokenization, and following the conventions of one camp,
> say the PTB, may disfavour parsers following another school of thought.
> would you consider preparing an updated version of the trial data that
> addressed these concerns?

I agree. I will work on cleaning up the data in the way you have
suggested and post an updated copy of the development set by next
week.

>
> --- in conclusion, let me just repeat that i am grateful you are doing
> this shared task.  so please view my comments as mere suggestions for
> making the specifics of this task more broadly applicable, rather than
> as challenging the overall design of the shared task!
>
>                                                  best wishes  -  oe
>

Thank you very much for your input. Please do not hesitate to contact
me with further suggestions.

deniz

> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
> +++    --- o...@ifi.uio.no; ste...@oepen.net; http://www.emmtee.net/oe/ ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Deniz Yuret

unread,
Mar 9, 2010, 7:23:32 AM3/9/10
to o...@ifi.uio.no, semeva...@googlegroups.com, aydi...@gmail.com, a...@cl.cam.ac.uk, da...@stanford.edu, ebe...@u.washington.edu, t...@ldwin.net, yzh...@coli.uni-sb.de, elisabe...@iln.uio.no, j...@ifi.uio.no, Laura Rimell
Hi Stephan,

Based on the feedback we received, I am planning to do two more releases of the data:

1. We will normalize the punctuation and spacing to follow regular typographical conventions rather than Penn Treebank format for both the text and the hypothesis.

2. We will review the grammaticality of the hypothesis sentences and make necessary corrections.  My goal is to make sure that all correct entailments should be grammatical sentences.  For incorrect entailments we will also try to make them grammatical when possible, but sometimes (e.g. entailment based on a bad parser decision which takes a noun for a verb) it may not be possible.  In such cases you can be confident that there is a parser out there which gives us that wrong interpretation.

For #1 one question I would like to ask is what to do with quotes.  We have double quotes in " " or `` '' format, we have single quotes in ` ' format etc.  What is the "standard typographical convention" in this case?

For #2 it may help if we can test grammaticality using parsers from the community.  Can you recommend some tools?

best,
deniz


On Sat, Mar 6, 2010 at 4:12 PM, Stephan Oepen <o...@ifi.uio.no> wrote:
hi again, deniz,

many thanks for the immediate and constructive reply!  also thanks for
forwarding communication with others to the list; lots of very useful
information there, i would say.

regarding ungrammatical hypotheses, i can confirm that almost all the
examples i had noticed are negative.  thus, a system that would assume
no entailment when it cannot parse the hypothesis may indeed do fairly
well on these cases.

the two positive hypotheses we fail to parse are:

 The officials complained something.

 Somebody denies there are something.

i would argue (much like laura, it appears) these are ungrammatical, as
`complain' cannot take NP complements, and `something' forces singular
agreement.  so, these two examples remain problematic, in my view.

regarding our earlier COLING workshop, thanks for pointing out that the
workshop proceedings, it seems, did not make it into the ACl Anthology.
i have now added a link to the proceedings to the workshop web site.

                                                  best wishes  -  oe

Deniz Yuret

unread,
Mar 9, 2010, 11:12:21 AM3/9/10
to semeva...@googlegroups.com
---------- Forwarded message ----------
From: Stephan Oepen <o...@ifi.uio.no>
Date: Tue, Mar 9, 2010 at 6:03 PM
Subject: Re: PETE trial data
To: deniz...@gmail.com
Cc: semeva...@googlegroups.com, aydi...@gmail.com,
a...@cl.cam.ac.uk, da...@stanford.edu, ebe...@u.washington.edu,
t...@ldwin.net, yzh...@coli.uni-sb.de, elisabe...@iln.uio.no,
j...@ifi.uio.no, laura....@cl.cam.ac.uk


hi again, deniz, many thanks for the careful follow-up!

i happen to have some opinions about quotes, indeed :-); we distinguish
the following types of quotes in our tokenizer:

 - straight or typewriter quotes: |"| and |'|
 - directional or UniCode quotes: |“|, |”|, |‘|, and |’|
 - LaTeX- and PTB-style quotes:   |``|, |''|, |`|, and |'|

in my view, the PTB developers were wise to preserve the distinction
between opening vs. closing quotes.  today, however, more and more
texts use `proper' UniCode directional quotes, which look nicer and
are single-character entities.

my personal preference would be to use UniCode quotes, which preserves
the available information and resembles what happens in typesetting.  i
wonder, however, if developers of parsers trained on the PTB might look
at such a `modern' representation of quotes as a bit of an obstacle; it
could be argued that text extracted from HTML or PDF, for example, will
most likely use UniCode quotes.  in pure ASCII text, however, one would
usually see straight quotes only, or (in some circles) the LaTeX-style
conventions adapted in the PTB.  on the other hand, these prohibit any
clustering of apostrophes, as found in bio-medical texts, for example:

 (B–D''') D-mib (green in B, B', C, C', D, and D') co-localized with
   Ser (red in [B and B'']), Dl (red in [C and C'']), N (red in [D and
   D'']), and E-Cadherin (E-Cad; blue in [D and D''']) and was found
   apical to Discs-large (Dlg; blue in [B, B''', C, and C''']) in
   notum cells located at the edges of the wing discs.

as with other aspects of the shared task, i guess you need to weigh the
advantages and disadvantages of either encoding, and make an executive
decision :-).

as for testing grammaticality, the ERG and associated parsing software
are open-source.  there is a web interface too, that could be scripted:

 http://erg.delph-in.net/

however, if we go ahead participating in the task, i am not sure i can
encourage you to use our parser for testing grammaticality?  there is a
bit of a circular dependency here, or?

finally, allow me to make a comment on your exchange with laura.  i too
think the use of passivized hypotheses, in general, makes sense for the
goals of this task (as i understand it).  i am of course biased in this
regard, as passives are among the things we believe we do well.  but in
terms of parser evaluation, if one were to construe the parsing task as
making available to interpretation information that is grammaticalized,
in a sufficiently abstract and normalized form, then passives seem like
a relevant layer of complexity to me.

Reply all
Reply to author
Forward
0 new messages