The exception is "Parentheses counts do not match for treebank
sentence: (TOP (S (NP-SBJ (NP (NNP Pierre)". My guess is that the
TreebankParsingExample expects trees one per line, and your trees span
several lines. TreebankFormatParser is really a pretty poorly
implemented class. I have it on my TODO list to implement a better
version using http://jparsec.codehaus.org/. But don't hold your
breath - it might be quite a while before I get around to that...
> pointers? Also, how do you create a new type system within UIMA fit? I
> believe they are automatically created from the type xml files, but are you
> supposed to create the XML files using the normal UIMA plugin and then do
> something in code? A pointer to a good example project to look at would be
> helpful there as well.
I'm not 100% sure which part you're having trouble with, so here's the
whole process:
(1) Write an XML file defining the type system you want to use.
Typically you would put this XML file under src/main/resources/
somewhere. For actually creating and editing the XML file, you can
follow the UIMA tutorial here:
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aae.defining_types
(2) Use JCasGen to compile your XML into Java files. In traditional
UIMA, you would use the JCasGen button (as described in the UIMA
tutorial) and be careful to always click that button every time you
modify your type system. In ClearTK, the parent pom provides a way of
having JCasGen invoked via Maven. Add a snippet like the following to
your pom.xml (assuming you've already added cleartk-release as a
<parent> as instructed in the User Setup page):
<properties>
<!-- Generate type system files using ${jcasgen.typesystem} -->
<jcasgen.typesystem>path/to/your/TypeSystem.xml</jcasgen.typesystem>
</properties>
<build>
<plugins>
<!-- Generate type system files using ${jcasgen.typesystem} -->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
If you want to see how exec-maven-plugin and build-helper-maven-plugin
have been configured to make this work, you can take a look at the
cleartk-parent pom - they invoke org.uimafit.util.JCasGenPomFriendly
and add ${project.build.directory}/generated-sources/jcasgen to your
build path:
http://cleartk.googlecode.com/svn/repo/org/cleartk/cleartk-parent/1.0.0/cleartk-parent-1.0.0.pom
(3) Once you have compiled your XML type system into Java classes, you
can reference it directly in your Java code. You can start writing
your code now.
(4) When you go to run your CollectionReader, you'll want to make sure
UIMA loads the type system along with your code. In traditional UIMA,
you would reference your XML type system in the XML descriptor for
your CollectionReader. In UimaFIT, you pass in an appropriate
TypeSystemDescription when you create your CollectionReader with
CollectionReaderFactory. You can either do this explicitly, using
TypeSystemDescriptionFactory to create a TypeSystemDescription, or you
can ask UimaFIT to automatically load your type system for you, by
adding a src/main/resources/META-INF/org.uimafit/types.txt file that
contains something like "classpath*:path/to/your/TypeSystem.xml". The
UimaFIT documentation for this approach is here:
http://code.google.com/p/uimafit/wiki/TypeDescriptorDetection
(5) You have now created a type system, compiled it to Java classes,
written a CollectionReader using those Java classes, and loaded the
type system along side your code when you run your CollectionReader.
> What I'm trying to do is as follows
>
> 1. Get a simple pipeline up and running that does the following
> - read in PTB parse files from Ontonotes
> - sets up a treebank view
> - runs through an analysis engine that does nothing
> - writes the cas to an xmi so I can look at it via CVD
> 2. Add into the AE a simple tree based annotator
> 3. Add into the pipeline a datawriter to format the data for the mallet
> maxent package (cleartk internal)
> 4. Create a collection reader for my verb sense data
> 5. Create a new view for the verb sense data
> 6. Add into the AE a simple annotator that uses the sense view
> 7. Expand the AE's/Annotators to include a dozen or so other things.
You'll probably have more questions as you work through these, but
lets sort out the type system and CollectionReader issues first. Let
me know how far you get after following my steps above.
> Along the way I'm proposing to write several tutorials around getting up and
> running writing code within the cleartk framework and working with UIMAfit.
> Any suggestions would be welcome. I believe I have wiki access so I will
> start stubbing these out. What else am I going to do over Christmas break?
>
> 1. Writing a new Collection reader
> 2. Using UIMA's tools to trouble shoot.
> 3. Setting up Type System
> 4. Working with Custom Views
> 5. Creating an Analysis Engine
> 6. Creating a new Feature Extractor
> 7. Creating a new Datawriter for a new ML/Analysis package
> 7. Aggregating AE's
> 8. Building a Simple Pipeline
> 9. Creating an Experiment Pipeline
This would be awesome. Feel free to steal any of my text above that is
useful for your purposes. I've tried to point out which things are
UIMA, which things are UimaFIT and which things are ClearTK - we
definitely need to get better at that...
Steve
--
Where did you get that preposterous hypothesis?
Did Steve tell you that?
--- The Hiphopopotamus
--
You received this message because you are subscribed to the Google Groups "cleartk-users" group.
To post to this group, send email to cleart...@googlegroups.com.
To unsubscribe from this group, send email to cleartk-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cleartk-users?hl=en.
--
You received this message because you are subscribed to the Google Groups "cleartk-users" group.