This implementation of a StatusCallbackListener fails:
import org.apache.uima.cas.CAS;
import org.apache.uima.collection.EntityProcessStatus;
import org.apache.uima.collection.StatusCallbackListener;
import org.apache.uima.jcas.JCas;
import de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData;
public class SimpleStatusCallbackListener implements StatusCallbackListener {
public void entityProcessComplete(CAS cas, EntityProcessStatus status) {
DocumentMetaData.get(cas);
}
}
The code I use for running a CpePipeline with this custom StatusCallbackListener is basically the same as in CpePipeline.runPipeline(...):
[...]
SimpleStatusCallbackListener status = new SimpleStatusCallbackListener();
CollectionProcessingEngine engine = builder.createCpe(status);
engine.process();
[...]
Running the pipeline, I get this error message:
java.lang.ClassCastException: org.apache.uima.cas.impl.AnnotationImpl cannot be cast to de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData
It looks like DocumentMetaData.get() seems to find something it thinks could be a DocumentMetaData annotation, but it isn't.
The DKPro DocumentMetaData annotation definitely exists in the corresponding JCas object in all stages of the pipeline up until the entityProcessComplete callback is invoked.
I've had some issues with vanishing DocumentMetaData in the past when copying CAS objects with org.apache.uima.util.CasCopier.copy(). I worked around those by copying via a XmiSerialize -> XmiDeserialize roundtrip, but that's not possible here.
Is there a way to access DocumentMetaData in entityProcessComplete?
- Benjamin
Thanks for your reply.
I tried DocumentMetaData.get(cas.getJCas()), but it results in the same error.
- Benjamin
Your comment about the JCas subsystem initialization made me try to select other annotations. This failed with the same ClassCastException, so this issue is not specific to DocumentMetaData.
Curiously, there occurs no ClassCastException if I copy the CAS object via a cas.getJCas() -> XmiSerialize -> XmiDeserialize roundtrip and then access the DocumentMetaData (or any other Annotation).
Something like this:
public void entityProcessComplete(CAS cas, EntityProcessStatus status) {
try {
JCas jcas = cas.getJCas();
JCas copy = copy(jcas);
DocumentMetaData.get(copy); // works!
} catch (CASException e1) {
...
}
}
Where copy is roughly:
public static JCas copy(JCas source) {
JCas target = JCasFactory.createJCas();
String xmi = serializeToXmi(source) // calls XmiCasSerializer.serialize(...)
deserializeXmiIntoJCas(xmi, target); // calls XmiCasSerializer.deserialize(...)
return target;
}
Is it possible that XmiCasSerializer only relies on FeatureStructure data and reconstructs the original (J)Cas object properly, while the CpePipeline at some point loses the information required to initialize the JCas subsystem?
- Benjamin