String In, String Out?

42 views
Skip to first unread message

Ivan Alencar

unread,
Jul 13, 2015, 12:25:33 PM7/13/15
to cleart...@googlegroups.com
Hello,

I'm new to cleartk and UIMA. So far I couldn't find any examples where you create a pipeline where no files are involved. I have the impression I'm using the framework the wrong way.

I'm trying to process a small text stored in a Java String variable using cleartk and UIMA, and get an XML String back (outcome of the cleartk TimeML annotators). 

I was able to provide a String as input (see excerpt below), but the code is far from elegant (needed to execute set and empty URI to the CAS.) Also, the output is being saved to a file, but I want to get a String back (it does not make sense to have the output saved to a file and then read the file back into memory).

        String documentText = "First make sure that you are using eggs that are several days old...";
       
JCas sourceCas = createJCas();

        sourceCas
.setDocumentText(documentText);
       
ViewUriUtil.setURI(sourceCas, new URI(""));

       
SimplePipeline.runPipeline(
                sourceCas
,
                org
.cleartk.opennlp.tools.SentenceAnnotator.getDescription(),
               
TokenAnnotator.getDescription(),
               
PosTaggerAnnotator.getDescription(),
               
DefaultSnowballStemmer.getDescription("English"),
                org
.cleartk.opennlp.tools.ParserAnnotator.getDescription(),
                org
.cleartk.timeml.time.TimeAnnotator.FACTORY.getAnnotatorDescription(),
               
TimeTypeAnnotator.FACTORY.getAnnotatorDescription(),
               
EventAnnotator.FACTORY.getAnnotatorDescription(),
               
EventTenseAnnotator.FACTORY.getAnnotatorDescription(),
               
EventAspectAnnotator.FACTORY.getAnnotatorDescription(),
               
EventClassAnnotator.FACTORY.getAnnotatorDescription(),
               
EventPolarityAnnotator.FACTORY.getAnnotatorDescription(),
               
EventModalityAnnotator.FACTORY.getAnnotatorDescription(),
               
AnalysisEngineFactory.createEngineDescription(AddEmptyDCT.class),
               
TemporalLinkEventToDocumentCreationTimeAnnotator.FACTORY.getAnnotatorDescription(),
               
TemporalLinkEventToSameSentenceTimeAnnotator.FACTORY.getAnnotatorDescription(),
               
TemporalLinkEventToSubordinatedEventAnnotator.FACTORY.getAnnotatorDescription(),
               
TempEval2007Writer.getDescription("file:///tmp/out.tml"));


What would be the best way have the pipeline take a String as input and produce another String as the execution result?

Thanks,
Ivan

Lee Becker

unread,
Jul 14, 2015, 12:34:24 AM7/14/15
to cleart...@googlegroups.com
If you look under the hood at SimplePipeline.runPipeline, most of its work consists of calling the collection reader to get all CASes, combining the list of analysis engine descriptors into an aggregate analysis engine, and calling said aggregates' process() method on the CAS.

For your purposes you might want to create your Aggregate and manipulate the CAS directly, like so:

AggregateBuilder builder = new AggregateBuilder();
// repeat add for all other descriptors save for the TempEval2007Writer.
builder.add(org.cleartk.opennlp.tools.SentenceAnnotator.getDescription())
// ..

// Instantiate Aggregate
AnalysisEngine aggregateEngine = builder.createAggregate()
aggregateEngine.process(sourceCas)

// Instead of instantiating and running the TempEval2007Writer analysis engine
// use its static toTimeML() method to get a String output.
String output = TempEval2007Writer.toTimeML(sourceCas)

I hope this points you in the right direction.  Steve Bethard can better answer specifics about the TimeML code.

Cheers,
Lee

Ivan Alencar

unread,
Jul 15, 2015, 11:49:24 AM7/15/15
to cleart...@googlegroups.com
Thanks, Lee. This worked beautifully!

Best,
Ivan
Reply all
Reply to author
Forward
0 new messages