Cannot get DocumentMetaData in a StatusCallbackListener

37 views
Skip to first unread message

Benjamin Heinzerling

unread,
Aug 5, 2016, 11:52:13 AM8/5/16
to dkpro-core-user
Hi,

This implementation of a StatusCallbackListener fails:

import org.apache.uima.cas.CAS;
import org.apache.uima.collection.EntityProcessStatus;
import org.apache.uima.collection.StatusCallbackListener;
import org.apache.uima.jcas.JCas;

import de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData;

public class SimpleStatusCallbackListener implements StatusCallbackListener {

public void entityProcessComplete(CAS cas, EntityProcessStatus status) {
DocumentMetaData.get(cas);
}
}

The code I use for running a CpePipeline with this custom StatusCallbackListener is basically the same as in CpePipeline.runPipeline(...):

[...]
SimpleStatusCallbackListener status = new SimpleStatusCallbackListener();
CollectionProcessingEngine engine = builder.createCpe(status);
engine.process();
[...]

Running the pipeline, I get this error message:

java.lang.ClassCastException: org.apache.uima.cas.impl.AnnotationImpl cannot be cast to de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData

It looks like DocumentMetaData.get() seems to find something it thinks could be a DocumentMetaData annotation, but it isn't.

The DKPro DocumentMetaData annotation definitely exists in the corresponding JCas object in all stages of the pipeline up until the entityProcessComplete callback is invoked.

I've had some issues with vanishing DocumentMetaData in the past when copying CAS objects with org.apache.uima.util.CasCopier.copy(). I worked around those by copying via a XmiSerialize -> XmiDeserialize roundtrip, but that's not possible here.

Is there a way to access DocumentMetaData in entityProcessComplete?

- Benjamin

Richard Eckart de Castilho

unread,
Aug 5, 2016, 1:39:34 PM8/5/16
to dkpro-c...@googlegroups.com
Hi,

please try this and let me know if it works for you:

DocumentMetaData.get(cas.getJCas());

-- Richard

Benjamin Heinzerling

unread,
Aug 8, 2016, 5:19:36 AM8/8/16
to dkpro-core-user
Hi Richard,

Thanks for your reply.

I tried DocumentMetaData.get(cas.getJCas()), but it results in the same error.

- Benjamin

Richard Eckart de Castilho

unread,
Aug 8, 2016, 8:24:47 AM8/8/16
to dkpro-c...@googlegroups.com
Normally, this problem occurs only if the JCas subsystem of the CAS has not been
initialized. If it is not initialized, the J(Cas) returns AnnotationImpl instances
instead of the proper JCas cover classes. Calling cas.getJCas() normally fixes that.
I have no idea why it does not help in your case.

Alternatively, you can use the CAS interface and FSUtil from uimaFIT 2.2.0, e.g.:

FeatureStructure dmd = CasUtil.selectSingle(cas, CasUtil.getType(cas, DocumentAnnotation.class));
String uri = FSUtil.getFeature(dmd, "documentUri", String.class);

Cheers,

-- Richard

Benjamin Heinzerling

unread,
Aug 8, 2016, 9:51:29 AM8/8/16
to dkpro-core-user
Thanks, accessing the FeatureStructure works.

Your comment about the JCas subsystem initialization made me try to select other annotations. This failed with the same ClassCastException, so this issue is not specific to DocumentMetaData.

Curiously, there occurs no ClassCastException if I copy the CAS object via a cas.getJCas() -> XmiSerialize -> XmiDeserialize roundtrip and then access the DocumentMetaData (or any other Annotation).

Something like this:

public void entityProcessComplete(CAS cas, EntityProcessStatus status) {

try {
JCas jcas = cas.getJCas();
JCas copy = copy(jcas);
DocumentMetaData.get(copy); // works!
} catch (CASException e1) {
...
}
}

Where copy is roughly:

public static JCas copy(JCas source) {
JCas target = JCasFactory.createJCas();
String xmi = serializeToXmi(source) // calls XmiCasSerializer.serialize(...)
deserializeXmiIntoJCas(xmi, target); // calls XmiCasSerializer.deserialize(...)
return target;
}

Is it possible that XmiCasSerializer only relies on FeatureStructure data and reconstructs the original (J)Cas object properly, while the CpePipeline at some point loses the information required to initialize the JCas subsystem?

- Benjamin

Richard Eckart de Castilho

unread,
Aug 8, 2016, 9:56:38 AM8/8/16
to dkpro-c...@googlegroups.com
I have observed the problem mainly in conjunction with binary deserialization. That is
why the BinaryCasReader contains these lines

// Initialize the JCas sub-system which is the most often used API in DKPro Core components
try {
aCAS.getJCas();
}
catch (CASException e) {
throw new CollectionException(e);
}

My understanding is that at any time getJCas() is called on a CAS object, the JCas subsystem should
be initialized and subsequent accessed to the CAS should produce JCas cover classes. But I don't
claim to have a full understanding of how this actually works under the hood.

Doing a XMI roundtrip should definitely not be necessary.

Can you produce some minimal code to reproduce the problem?

Cheers,

-- Richard

Benjamin Heinzerling

unread,
Aug 10, 2016, 9:11:23 AM8/10/16
to dkpro-core-user
I've attached a minimal example that results in a ClassCastException when invoking DocumentMetaData.get(...) in StatusCallbackListener.entityProcessComplete(...)
MinimalExample.java

Richard Eckart de Castilho

unread,
Aug 11, 2016, 4:30:45 PM8/11/16
to dkpro-c...@googlegroups.com
> On 10.08.2016, at 15:11, Benjamin Heinzerling <benjamin.h...@h-its.org> wrote:
>
> I've attached a minimal example that results in a ClassCastException when invoking DocumentMetaData.get(...) in StatusCallbackListener.entityProcessComplete(...)

Good catch!

Thanks for providing the example! It helped me diagnose the problem.

It seems to be to be a bug in the UIMA-CPE. I have opened an issue with UIMA-SDK
to discuss if that behavior is intentional or a bug:

https://issues.apache.org/jira/browse/UIMA-5054

Cheers,

-- Richard


Reply all
Reply to author
Forward
0 new messages