Summarization: SumBasic Pipeline & Undeclared type SummarySentence?

35 views
Skip to first unread message

igor.b...@ucdconnect.ie

unread,
Oct 10, 2014, 2:48:37 PM10/10/14
to cleart...@googlegroups.com
Hi,

Just wondering if there's an example pipeline for running SumBasic.

Having some issues with an "Undeclared type [org.cleartk.summarization.type.SummarySentence]",

Also, I can't seem to find the right way to specify a "--max-num-sentences" parameter.

In my pom I have:

<dependency>
 
<groupId>org.cleartk</groupId>
 
<artifactId>cleartk-summarization</artifactId>
 
<version>2.0.0</version>
</dependency>

And using this to Run SumBasic:

CollectionReader reader = UriCollectionReader.getCollectionReaderFromDirectory(documentsDirectory);


AggregateBuilder builder = new AggregateBuilder();

    builder
.add(UriToDocumentTextAnnotator.getDescription());

    builder
.add(SentenceAnnotator.getDescription());
    builder
.add(TokenAnnotator.getDescription());
    builder
.add(PosTaggerAnnotator.getDescription());
    builder
.add(DefaultSnowballStemmer.getDescription("English"));

    builder
.add(AnalysisEngineFactory.createEngineDescription(
       
SumBasicAnnotator.class,
       
DefaultDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME,
       
SumBasicDataWriter.class.getName(),
       
DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
        modelDirectory
.getPath(),
       
SumBasicAnnotator.PARAM_TOKEN_FIELD,
       
SumBasicAnnotator.TokenField.COVERED_TEXT,
       
SumBasicAnnotator.PARAM_STOPWORDS_URI,
        stopwordsFile
.toURI()));

    builder
.add(AnalysisEngineFactory.createEngineDescription(
       
XmiWriter.class,
       
XmiWriter.PARAM_OUTPUT_DIRECTORY,
        xmiDirectory
.getPath()));
   
    builder
.add(SummarySentenceWriterAnnotator.getDescription(sentencesOutFile, true));
   
   
System.out.println("Summarize:");
   
SimplePipeline.runPipeline(reader, builder.createAggregateDescription());

I get as far at a "training-data.instances" file in my modelDirectory and a null.xmi in xmiDirectory.

Any pointers to what I'm missing?

Thanks!

Steven Bethard

unread,
Oct 10, 2014, 4:55:10 PM10/10/14
to cleart...@googlegroups.com
On Fri, Oct 10, 2014 at 2:48 PM, <igor.b...@ucdconnect.ie> wrote:
> Just wondering if there's an example pipeline for running SumBasic.

This is still @Beta code as you've probably noticed, so we don't have
an official example of it, but you can see a bit what the pipeline
looks like here:

https://code.google.com/p/cleartk/source/browse/cleartk-summarization/src/main/java/org/cleartk/summarization/SumBasic.java#128

> Having some issues with an "Undeclared type
> [org.cleartk.summarization.type.SummarySentence]",

This class should be included in the .jar on Maven Central. If you're
working directly from the repository, make sure that the
jcasgen-maven-plugin has been executed on
src/main/resources/org/cleartk/summarization/TypeSystem.xml. This
should have happened automatically if the project is being built by
Maven.

> Also, I can't seem to find the right way to specify a "--max-num-sentences"
> parameter.

I think the example I liked above shows how to specify that.

Steve

igor.b...@ucdconnect.ie

unread,
Oct 13, 2014, 9:07:25 AM10/13/14
to cleart...@googlegroups.com
Thanks,

I was still getting JCas type used in Java code, but was not declared in the XML type errors, I managed to fix it in the end by recompiling cleartk-summarization:

I changed:
cleartk-summarization/src/main/resources/META-INF/org.uimafit/types.txt
to 
cleartk-summarization/src/main/resources/META-INF/org.apache.uima.fit/types.txt

Seems to work now.

Steven Bethard

unread,
Oct 14, 2014, 10:39:10 PM10/14/14
to cleart...@googlegroups.com
Yes, that does look like a bug. If you have a chance, could you file a
bug at https://code.google.com/p/cleartk/issues/list? That way we'll
be sure to fix it before the next release.

Steve
> --
> You received this message because you are subscribed to the Google Groups
> "cleartk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cleartk-user...@googlegroups.com.
> To post to this group, send email to cleart...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cleartk-users.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages