JCas type "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence" used in Java code, but was not declared in the XML type descriptor

22 views
Skip to first unread message

Assassin Gaming

unread,
Dec 30, 2019, 2:49:00 AM12/30/19
to dkpro-core-developers
Hey All,
I tried using dkpro core with UIMA but the following exception occured:-

Caused by: org.apache.uima.cas.CASRuntimeException: JCas type "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence" used in Java code,  but was not declared in the XML type descriptor.

I'm using a very simple code:-

package Test.DkProTest;
import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.jcas.JCas;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence;
public class Testdk extends JCasAnnotator_ImplBase
{
@Override
public void process(JCas arg0) throws AnalysisEngineProcessException {
// TODO Auto-generated method stub

String etxt=arg0.getDocumentText();
String rf[]=etxt.split(",");
int beg=0;
int end=0;
for (String string : rf) {
end=end+string.length();
Sentence sent=new Sentence(arg0, beg, end);
sent.addToIndexes();
}
}
}

POM.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>Test</groupId>
  <artifactId>DkProTest</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>DkProTest</name>
  <!-- FIXME change it to the project's website -->
  <url>http://www.example.com</url>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
  </properties>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <dependency>
            <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
            <artifactId>de.tudarmstadt.ukp.dkpro.core.opennlp-asl</artifactId>
            <version>1.9.0</version>
        </dependency> 
<dependency>
            <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
            <artifactId>de.tudarmstadt.ukp.dkpro.core.languagetool-asl</artifactId>
            <version>1.9.0</version>
        </dependency>
        <dependency>
            <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
            <artifactId>de.tudarmstadt.ukp.dkpro.core.maltparser-asl</artifactId>
            <version>1.9.0</version>
        </dependency>
        <dependency>
            <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
            <artifactId>de.tudarmstadt.ukp.dkpro.core.io.text-asl</artifactId>
            <version>1.9.0</version>
        </dependency>
        <dependency>
            <groupId>de.tudarmstadt.ukp.dkpro.core</groupId>
            <artifactId>de.tudarmstadt.ukp.dkpro.core.io.conll-asl</artifactId>
            <version>1.9.0</version>
        </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.uima/uimaj-core -->
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-core</artifactId>
<version>2.10.2</version>
</dependency>
    <dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-document-annotation</artifactId>
<version>2.10.2</version>
</dependency>
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-tools</artifactId>
<version>2.10.2</version>
</dependency>
  </dependencies>
  <build>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
      <plugins>
        <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
        <plugin>
          <artifactId>maven-clean-plugin</artifactId>
          <version>3.1.0</version>
        </plugin>
        <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.0</version>
        </plugin>
        <plugin>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.22.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-install-plugin</artifactId>
          <version>2.5.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.8.2</version>
        </plugin>
        <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
        <plugin>
          <artifactId>maven-site-plugin</artifactId>
          <version>3.7.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-project-info-reports-plugin</artifactId>
          <version>3.0.0</version>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>
</project>

I work on UIMA regularly, so i know that such exceptions come when we don't set a certain typesystem(In our case Dkpro's Sentence typesystem) in our analysis engine xml.
The problem is that i don't have dkpro's typesystem so that i can add them in analysis engine, instead i have maven dependencies of the same.

Feel free to write a obsessed reply :)

Richard Eckart de Castilho

unread,
Dec 30, 2019, 3:41:50 AM12/30/19
to dkpro-core-developers
On 30. Dec 2019, at 08:49, Assassin Gaming <rohi...@gmail.com> wrote:
>
> I work on UIMA regularly, so i know that such exceptions come when we don't set a certain typesystem(In our case Dkpro's Sentence typesystem) in our analysis engine xml.
> The problem is that i don't have dkpro's typesystem so that i can add them in analysis engine, instead i have maven dependencies of the same.

DKPro Core is typically used in conjunction with uimaFIT and uimaFIT's type-system detection mechanism. This mechanism kicks in e.g. when running constructing readers/engines, running pipelines or constructing CASes:


CollectionReaderDescription textReader = createReaderDescription(
TextReader.class,
TextReader.PARAM_LANGUAGE, "en",
TextReader.PARAM_SOURCE_LOCATION, "src/test/resources/texts/*.txt");

AnalysisEngineDescription segmenter = createEngineDescription(OpenNlpSegmenter.class);
AnalysisEngineDescription posTagger = createEngineDescription(OpenNlpPosTagger.class);
AnalysisEngineDescription parser = createEngineDescription(OpenNlpParser.class);
AnalysisEngineDescription ner = createEngineDescription(OpenNlpNamedEntityRecognizer.class);
AnalysisEngineDescription dump = createEngineDescription(CasDumpWriter.class);
AnalysisEngineDescription teiWriter = createEngineDescription(
TeiWriter.class,
TeiWriter.PARAM_TARGET_LOCATION, targetFolder,
TeiWriter.PARAM_WRITE_CONSTITUENT, true);

SimplePipeline.runPipeline(textReader, segmenter, posTagger, parser, ner, dump, teiWriter);

You see above that the type system is never injected because it is automatically handled through classpath scanning by uimaFIT [1].

In your example, you didn't say how you constructed and ran the pipeline...

If you want to access the DKPro Core type systems directly: they are included in the JARs - that means they are also on the classpath and you can import them into your down type system descriptors, e.g. using lines such as

<imports>
<import name="desc.type.LexicalUnits"/>
<import name="desc.type.LexicalUnits_customized"/>
</imports>

There are various type system descriptors in the various DKPro Core modules.

One easy way to aggregate all type system descriptors available on the classpath via uimaFIT would be this:

TypeSystemDescription tsd = TypeSystemDescriptionFactory.createTypeSystemDescription();

You can then write the tsd as XML to a file if you want.

You may want to have a closer look and uimaFIT and e.g. how it is used in DKPro Core to inject parameter values [2].
Note in particular how the code extends the uimaFIT base classes instead of the UIMA base classes of the same name,
e.g. `org.apache.uima.fit.component.JCasAnnotator_ImplBase`.

Cheers,

-- Richard

[1] https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem
[2] https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#d5e120

Assassin Gaming

unread,
Dec 30, 2019, 4:19:36 AM12/30/19
to dkpro-core-developers
Hey Richard,

I understand your point, but picking the typesystem from classpath and adding them in my flow can be cumbersome as there are many typesystems of dkpro + in future if the typesystems are updated i'll have to do the same work again.

Let me give you a brief insight of what i'm trying to achieve here:-
  • I already have a UIMA pipeline for English NLP which contains my custom made typesystems. Now i've a requirement to set some of my custom typesystems(Sentence,Token,NED, Phrases) into Stanford's CoreMap, so that i can further run pikes code on it to extract SRL.
  • Now Dkpro core has already written the code to convert their typesystem into CoreMap(Dkpro2CoreMap) and vice versa. So, i thought i should rather map my custom typesystem to dkpro's typesystem, which will give my the ability to further change into CoreMap.
Do you think it is achievable?
Any other suggestions will be welcomed.

Best,
Rohit

Richard Eckart de Castilho

unread,
Dec 30, 2019, 5:01:09 AM12/30/19
to dkpro-core-developers
Hi,

> On 30. Dec 2019, at 10:19, Assassin Gaming <rohi...@gmail.com> wrote:
>
> I understand your point, but picking the typesystem from classpath and adding them in my flow can be cumbersome as there are many typesystems of dkpro + in future if the typesystems are updated i'll have to do the same work again.

The way uimaFIT's type system detection works makes sure that if you update the Maven dependencies that include the type system descriptors, they will be used. So if you'd use that facility, the only effort you'd have is to update the dependencies. You can also register your custom types to be picked up by uimaFIT's mechanism as described in the uimaFIT documentation. The nice thing about uimaFIT is that it reduces the need to wrangle UIMA XML descriptor files to an absolute minimum.

> Let me give you a brief insight of what i'm trying to achieve here:-
> • I already have a UIMA pipeline for English NLP which contains my custom made typesystems. Now i've a requirement to set some of my custom typesystems(Sentence,Token,NED, Phrases) into Stanford's CoreMap, so that i can further run pikes code on it to extract SRL.
> • Now Dkpro core has already written the code to convert their typesystem into CoreMap(Dkpro2CoreMap) and vice versa. So, i thought i should rather map my custom typesystem to dkpro's typesystem, which will give my the ability to further change into CoreMap.
> Do you think it is achievable?

Converting between type systems is a pain in the lower backside.

I see two options:

1) You switch from your current type system to the DKPro Core type system since DKPro Core would seem to support already what you need in your pre-processing so far in terms of types as well as in terms of a range of components being able to work with these types. There are also SRL types in DKPro Core which you might be able to make use of (SemPred and SemArg). Or you could build your own custom SRL types on top of the DKPro Core type system.

2) You stay with your current type system, copy the parts of DKPro2CoreNlp and CoreNlp2DKPro that you need into your own project and adjust/extend them as you need. That's what open source does for you ;) Please attribute the origin of the copied code by not removing existing license headers and including an appropriate statement in your code repository (e.g. in a NOTICE.txt file).

Cheers,

-- Richard
Reply all
Reply to author
Forward
0 new messages