Working with the Ontonotes corpus

Ross Hendrickson

unread,

Dec 8, 2011, 9:10:05 AM12/8/11

to cleart...@googlegroups.com

Hey all,

I'm trying to do some work with the Ontonotes corpus but I'm failing at getting a Collection Reader up and running. I tried working off the PropBankGoldReader, that was a little too complex, so I backed off and tried to just get the TreebankParsingExample to work with a single file. That is also failing. The stack trace for the TreebankParsingExample is below. Any pointers? Also, how do you create a new type system within UIMA fit? I believe they are automatically created from the type xml files, but are you supposed to create the XML files using the normal UIMA plugin and then do something in code? A pointer to a good example project to look at would be helpful there as well. What I'm trying to do is as follows

1. Get a simple pipeline up and running that does the following

- read in PTB parse files from Ontonotes

- sets up a treebank view

- runs through an analysis engine that does nothing

- writes the cas to an xmi so I can look at it via CVD

2. Add into the AE a simple tree based annotator

3. Add into the pipeline a datawriter to format the data for the mallet maxent package (cleartk internal)

4. Create a collection reader for my verb sense data

5. Create a new view for the verb sense data

6. Add into the AE a simple annotator that uses the sense view

7. Expand the AE's/Annotators to include a dozen or so other things.

Along the way I'm proposing to write several tutorials around getting up and running writing code within the cleartk framework and working with UIMAfit. Any suggestions would be welcome. I believe I have wiki access so I will start stubbing these out. What else am I going to do over Christmas break?

1. Writing a new Collection reader

2. Using UIMA's tools to trouble shoot.

3. Setting up Type System

4. Working with Custom Views

5. Creating an Analysis Engine

6. Creating a new Feature Extractor

7. Creating a new Datawriter for a new ML/Analysis package

7. Aggregating AE's

8. Building a Simple Pipeline

9. Creating an Experiment Pipeline

Thanks for all the help guys.

Ross

Stack Trace

Dec 8, 2011 6:53:20 AM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(405)

SEVERE: Exception occurred

org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.

at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)

at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)

at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)

at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)

at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)

at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)

at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)

at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:209)

at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:145)

at org.cleartk.examples.treebank.TreebankParsingExample.main(TreebankParsingExample.java:128)

Caused by: java.lang.IllegalArgumentException: Parentheses counts do not match for treebank sentence: (TOP (S (NP-SBJ (NP (NNP Pierre)

at org.cleartk.syntax.constituent.util.TreebankFormatParser.splitSentences(TreebankFormatParser.java:504)

at org.cleartk.syntax.constituent.util.TreebankFormatParser.inferPlainText(TreebankFormatParser.java:204)

at org.cleartk.syntax.constituent.TreebankGoldAnnotator.process(TreebankGoldAnnotator.java:109)

at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)

at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)

... 9 more

Dec 8, 2011 6:53:20 AM org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl processAndOutputNewCASes(275)