Version 2 of the development data is released and is available for
download at: http://sites.google.com/a/yuret.com/pete/docs/PETE_dev_v2.zip.
This version attempts to normalize punctuation and spacing for both
the text and the hypothesis sentences. We tried to format all
sentences consistently as they would appear in the source, rather than
follow the Penn Treebank tokenization rules.
After some deliberation we decided to stick with ASCII instead of
introducing Unicode characters. All quotes were converted to the
ASCII double quote ["] character. This loses the information about
which is the opening and which is the closing double quote. However
we made sure that each sentence has an even number of quotes with no
embedding, so it should be fairly easy to write a script that
automatically converts them into your favorite type of quote.
We are working on Version 3, which will improve the grammatical
consistency of the entailment sentences. We appreciate all the
feedback from you which I am sure will result in a much improved
dataset for this task.