Struggling to integrate uimaFIT and OpenNLP in a Maven project...

184 views
Skip to first unread message

dvisser

unread,
May 10, 2013, 11:11:48 AM5/10/13
to uimafi...@googlegroups.com
I have the following dependencies defined:

    <dependency>
      <groupId>org.uimafit</groupId>
      <artifactId>uimafit</artifactId>
      <version>1.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.opennlp</groupId>
        <artifactId>opennlp-uima</artifactId>
        <version>1.5.3</version>
    </dependency>

AFAIK, they both depend on UIMA v2.3.1.

This bit of code won't compile:

import opennlp.tools.sentdetect.SentenceDetector;
import org.apache.uima.analysis_component.AnalysisComponent;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.resource.ResourceInitializationException;
import org.uimafit.factory.AnalysisEngineFactory;

public class SentenceDetect {

    public SentenceDetect() throws ResourceInitializationException{
        AnalysisEngine analysisEngine = AnalysisEngineFactory.createPrimitive(
                SentenceDetector.class,
                new Object[] {"opennlp.uima.ModelName", "en-sent.bin"});
    }
}

The compiler complains:

[ERROR] /home/dale/Documents/git/3153/nlp/UIMAfit/src/main/java/org/ida/uimafit/SentenceDetect.java:[14,86] inconvertible types
  required: java.lang.Class<? extends org.apache.uima.analysis_component.AnalysisComponent>
  found:    java.lang.Class<opennlp.tools.sentdetect.SentenceDetector>

But SentenceDetector is descended from AnalysisComponent_ImplBase which implements AnalysisComponent?

Does anybody know where I'm going wrong here?

Best regards,
Dale Visser

Richard Eckart de Castilho

unread,
May 10, 2013, 11:28:28 AM5/10/13
to uimafi...@googlegroups.com
Hi,

I think you are mixing up things. See comments inline below.

Am 10.05.2013 um 17:11 schrieb dvisser <dale....@gmail.com>:

> I have the following dependencies defined:
>
> <dependency>
> <groupId>org.uimafit</groupId>
> <artifactId>uimafit</artifactId>
> <version>1.2.0</version>
> </dependency>

You should upgrade to 1.4.0.

> <dependency>
> <groupId>org.apache.opennlp</groupId>
> <artifactId>opennlp-uima</artifactId>
> <version>1.5.3</version>
> </dependency>
>
> This bit of code won't compile:
>
> import opennlp.tools.sentdetect.SentenceDetector;
> import org.apache.uima.analysis_component.AnalysisComponent;
> import org.apache.uima.analysis_engine.AnalysisEngine;
> import org.apache.uima.resource.ResourceInitializationException;
> import org.uimafit.factory.AnalysisEngineFactory;

> The compiler complains:
>
> [ERROR] /home/dale/Documents/git/3153/nlp/UIMAfit/src/main/java/org/ida/uimafit/SentenceDetect.java:[14,86] inconvertible types
> required: java.lang.Class<? extends org.apache.uima.analysis_component.AnalysisComponent>
> found: java.lang.Class<opennlp.tools.sentdetect.SentenceDetector>
>
> But SentenceDetector is descended from AnalysisComponent_ImplBase which implements AnalysisComponent?

opennlp.tools.sentdetect.SentenceDetector is an interface! It is not an UIMA component.

opennlp.uima.sentdetect.SentenceDetector is the one you are looking for.

> public class SentenceDetect {
>
> public SentenceDetect() throws ResourceInitializationException{
> AnalysisEngine analysisEngine = AnalysisEngineFactory.createPrimitive(
> SentenceDetector.class,
> new Object[] {"opennlp.uima.ModelName", "en-sent.bin"});
> }
> }

That is an invalid call to createPrimitive. It should be:

AnalysisEngine analysisEngine = AnalysisEngineFactory.createPrimitive(
SentenceDetector.class,
"opennlp.uima.ModelName", "en-sent.bin"
"opennlp.uima.SentenceType", "YourSentenceTypeName"
);

I am not sure, though, if that will work. You may also have to specify the "optional" parameters, in particular "opennlp.uima.ContainerType".

Once you get beyond the sentence splitter (or possibly even before!), you'll find the information on using OpenNLP components with uimaFIT on this wiki page helpful (search for OpenNLP):

https://code.google.com/p/uimafit/wiki/ExternalResources

-- Richard

Dale Visser

unread,
May 13, 2013, 2:01:52 PM5/13/13
to uimafi...@googlegroups.com
Thank you for your help. I have made the change to my POM file, and now I am stuck at a different point. Here is my source code:

package org.ida.uimafit;

import static org.uimafit.factory.AnalysisEngineFactory.createAggregate;
import static org.uimafit.factory.AnalysisEngineFactory.createAggregateDescription;
import static org.uimafit.factory.AnalysisEngineFactory.createPrimitiveDescription;
import static org.uimafit.factory.ExternalResourceFactory.createDependencyAndBind;

import java.io.File;
import java.io.IOException;
import java.net.URL;

import opennlp.tools.formats.ad.ADSentenceStream.Sentence;
import opennlp.uima.sentdetect.SentenceDetector;
import opennlp.uima.sentdetect.SentenceModelResourceImpl;
import opennlp.uima.tokenize.Tokenizer;
import opennlp.uima.tokenize.TokenizerModelResourceImpl;
import opennlp.uima.util.UimaUtil;

import org.apache.commons.io.FileUtils;
import org.apache.uima.analysis_component.AnalysisComponent;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.analysis_engine.AnalysisEngineDescription;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;
import org.apache.uima.resource.SharedResourceObject;
import org.apache.uima.util.InvalidXMLException;

public class SentenceDetect {

    File file;
    AnalysisEngineDescription aggregate;

    public SentenceDetect() throws ResourceInitializationException,
            InvalidXMLException {
        // Create descriptors
        AnalysisEngineDescription sentenceDetector = createDescriptionAndBindModel(
                SentenceDetector.class,
                "http://opennlp.sourceforge.net/models-1.5/en-sent.bin",
                SentenceModelResourceImpl.class);
        AnalysisEngineDescription tokenizer = createDescriptionAndBindModel(
                Tokenizer.class,
                "http://opennlp.sourceforge.net/models-1.5/en-token.bin",
                TokenizerModelResourceImpl.class);
        aggregate = createAggregateDescription(sentenceDetector, tokenizer);
        URL resourceURL = getClass().getResource("/test.txt");
        file = FileUtils.toFile(resourceURL);
    }

    protected void execute() throws IOException,
            AnalysisEngineProcessException, ResourceInitializationException {
        AnalysisEngine engine = createAggregate(aggregate);
        JCas jCas = engine.newJCas();
        jCas.setDocumentText(FileUtils.readFileToString(file));
        engine.process(jCas);
    }

    private AnalysisEngineDescription createDescriptionAndBindModel(
            Class<? extends AnalysisComponent> aeClass, String modelURL,
            Class<? extends SharedResourceObject> srClass)
            throws ResourceInitializationException, InvalidXMLException {
        AnalysisEngineDescription description = createPrimitiveDescription(
                aeClass, UimaUtil.TOKEN_TYPE_PARAMETER,
                Tokenizer.class.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
                Sentence.class.getName());
        createDependencyAndBind(description, UimaUtil.MODEL_PARAMETER, srClass,
                modelURL);
        return description;
    }
}

I get the following stack trace failure:

opennlp.uima.util.OpenNlpAnnotatorProcessException: "Could not find opennlp.tools.formats.ad.ADSentenceStream$Sentence type!"
    at opennlp.uima.util.AnnotatorUtil.getType(AnnotatorUtil.java:60)
    at opennlp.uima.util.AnnotatorUtil.getRequiredTypeParameter(AnnotatorUtil.java:163)
    at opennlp.uima.sentdetect.AbstractSentenceDetector.typeSystemInit(AbstractSentenceDetector.java:84)
    at opennlp.uima.sentdetect.SentenceDetector.typeSystemInit(SentenceDetector.java:102)
    at org.apache.uima.analysis_component.CasAnnotator_ImplBase.checkTypeSystemChange(CasAnnotator_ImplBase.java:100)
    at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:55)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)
    at org.ida.uimafit.SentenceDetect.execute(SentenceDetect.java:55)
    ...

AFAIK, the class mentioned in the error text is in my classpath. Help?

Best regards,
Dale

Richard Eckart de Castilho

unread,
May 13, 2013, 2:05:22 PM5/13/13
to uimafi...@googlegroups.com
uimaFIT doesn't know about your types. Just having the JCas wrappers in the classpath isn't sufficient. UIMA requires the XML descriptor as well.

Check out the uimaFIT type descriptor detection documentation on how to make your type known to uimaFIT, so that it can provide UIMA with the descriptors.

https://code.google.com/p/uimafit/wiki/TypeDescriptorDetection

-- Richard
> --
> You received this message because you are subscribed to the Google Groups "uimafit-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to uimafit-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Reply all
Reply to author
Forward
0 new messages