Fail processing multiple nodes from the same RDF

13 views
Skip to first unread message

Artjom Klein

unread,
Mar 27, 2012, 6:07:32 PM3/27/12
to Luke McCarthy, sadi...@googlegroups.com
Hi Luke,

I noticed a problem by trying to process several documents which are in the same RDF with the Drug Extraction Service (kind of batch processing).

Problem is general and not related to a particular service: if SADI service throws exception by processing one of the input nodes in  RDF, the service just returns the exception message, and does not return any other nodes even if they were processed successfully.
Thus, the batch processing is not possible.

Below is an example input (from attached file). There are 2 nodes which are instances of input of the Drug Extraction Service:  http://unbsj.biordf.net/ie-sadi/extractDrugNamesFromTextV4
If there is no text attached via 'sourceString' property to the node, the service throws java.lang.IllegalArgumentException with message "Input node has no text associated with it by rdf:value, bibo:content or rss:link.".
For testing purpose, I removed the 'sourceString'-property from the second node. The service processes the first node, fails by the second node, and returns exception message.
What I expected is the first node with attached Drugs and the second node and/or the error message.

Please let me know, if I am doing something wrong.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://nlp2rdf.lod2.eu/schema/string/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
 <rdf:Description rdf:about="http://example.com/document1.html#offset_0_165_%22St.%20John%27s%20Wort%20-%20A">
    <rdf:type rdf:resource="http://unbsj.biordf.net/information-extraction/ie-sadi-service-ontology.owl#extractDrugNamesFromText_Input"/>
    <j.0:sourceString>a protease inhibitor used to treat HIVc. Digoxin (Lanoxicaps, Lanoxin), a drug used to increase the force of contraction of heart muscle and to regulate heartbeatsd.</j.0:sourceString>
    <rdf:type rdf:resource="http://nlp2rdf.lod2.eu/schema/string/Document"/>
    <rdf:type rdf:resource="http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString"/>
</rdf:Description>
 <rdf:Description rdf:about="http://example.com/document2.html#offset_0_172_%22St.%20John%27s%20Wort%20-%20A">
    <rdf:type rdf:resource="http://unbsj.biordf.net/information-extraction/ie-sadi-service-ontology.owl#extractDrugNamesFromText_Input"/>
    <rdf:type rdf:resource="http://nlp2rdf.lod2.eu/schema/string/Document"/>
    <rdf:type rdf:resource="http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString"/>
  </rdf:Description>
</rdf:RDF>



This is part of the code which throws exception:

public void processInput(Resource input, Resource output) {
        log.info("New service invocation.");

        Model outputModel = output.getModel();
        // This will hold the text of the input document.
        String text = null;

        Statement textPropValue = input.getProperty(Vocab.sourceString);
        if (textPropValue == null)
            textPropValue = input.getProperty(Vocab.content);

        if (textPropValue != null) {
            text = textPropValue.getString();
            log.info("I have read input text as a value specified with nlp2rdf:sourceString or bibo:content.");
        } else {
            log.fatal("Input node has no text associated with it by rdf:value, bibo:content or rss:link.");
            throw new IllegalArgumentException("Input node has no text associated with it by rdf:value, bibo:content or rss:link.");
        }

...

-Artjom


sample_input_ednft2.rdf

Luke McCarthy

unread,
Mar 27, 2012, 6:17:13 PM3/27/12
to sadi...@googlegroups.com
Hi Artjom,

That's by design (invalid input == error message, partially-invalid input == invalid input). The only thing the SADI spec has to say about errors is that you have to return an HTTP error code, but that's enough so that it's difficult to return a partial response (the closest thing is maybe HTTP 207, but that's not strictly part of HTTP, but rather the WebDav extension…)

When we get around to implementing multi-part requests and responses (there are some earlier messages in this group about that), the correct approach will possibly be clearer, though we can still only have one HTTP response code. I am open to suggestions as to how to deal with that and still remain true to the published SADI spec.

Cheers,

Luke

> <sample_input_ednft2.rdf>

Luke McCarthy

unread,
Mar 27, 2012, 6:18:43 PM3/27/12
to sadi...@googlegroups.com
I should also say, I suppose, that if you don't consider a particular case to be an error, just don't throw an exception. You as the service provider can just ignore that particular input if you prefer.

Jim McCusker

unread,
Mar 27, 2012, 6:32:11 PM3/27/12
to sadi...@googlegroups.com
One possible strategy (again, for the developer) is to provide an alternative class for a given node that defines a link to the error (in Manchester Notation):

Class: NPEFailedNode
EquivalentClass: FailedNode and hasException exactly 1 NullPointerException

Class: Exception
EquivalentClass: hasMessage exactly 1 xsd:string and hasStackTrace exactly 1 xsd:string

Class: NullPointerException
SubClassOf: Exception

Or something similar.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.m...@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mcc...@cs.rpi.edu
http://tw.rpi.edu

Alexandre Riazanov

unread,
Mar 27, 2012, 8:38:13 PM3/27/12
to sadi...@googlegroups.com
But this would be useless unless it's standartised and supported by client software, right?
--
======================================
Alexandre Riazanov (Alexander Ryazanov), PhD
Saint John, New Brunswick, Canada
Skype: alexandre.riazanov
http://www.freewebs.com/riazanov/
http://www.linkedin.com/in/riazanov
http://www.unbsj.ca/sase/csas/faculty.php
======================================

Luke McCarthy

unread,
Mar 27, 2012, 8:54:43 PM3/27/12
to sadi...@googlegroups.com
Right, but no more or less useful than anything proposed here (beyond the current pass-or-fail dictated by the spec, I guess). I don't think that means it's useless to be having the discussion, though. If consensus emerges here, we can build support into the clients/API/spec.

Also, more information is probably never bad. Jim's solution is behaviour-ly equivalent to just ignoring the error and not sending any output (from the client API, unexpected data is the same as no data at all unless the vocabularies overlap, which will probably not happen by accident), but has more information if anyone happens to be looking for it. If there's some standard vocabulary for describing errors in RDF, so much the better.

Alexandre Riazanov

unread,
Mar 27, 2012, 11:21:38 PM3/27/12
to sadi...@googlegroups.com
On Tue, Mar 27, 2012 at 9:54 PM, Luke McCarthy <lu...@elmonline.ca> wrote:
Right, but no more or less useful than anything proposed here (beyond the current pass-or-fail dictated by the spec, I guess).  I don't think that means it's useless to be having the discussion, though.  If consensus emerges here, we can build support into the clients/API/spec.

Also, more information is probably never bad.  Jim's solution is behaviour-ly equivalent to just ignoring the error and not sending any output (from the client API, unexpected data is the same as no data at all

Yes, unless you postulate in the spec that error message data is always expected as if the corresponding error description class is attached with a disjunction to the input class definition. Then some client programs can process the error messages properly.
 
unless the vocabularies overlap, which will probably not happen by accident), but has more information if anyone happens to be looking for it.  If there's some standard vocabulary for describing errors in RDF, so much the better. 

I don't see much value in using an existing vocabulary for this purpose. It would be OK to define something SADI specific -- SADI client developers are unlikely to have trouble implementing something like this from scratch.

 

Jim McCusker

unread,
Mar 27, 2012, 11:35:51 PM3/27/12
to sadi...@googlegroups.com
On Tue, Mar 27, 2012 at 11:21 PM, Alexandre Riazanov <alexandre...@gmail.com> wrote:
Yes, unless you postulate in the spec that error message data is always expected as if the corresponding error description class is attached with a disjunction to the input class definition. Then some client programs can process the error messages properly. 

We probably don't need to go any further than state that a node with an error has a class that is disjoint from the output class (not input, the node is already supposed to belong to the input class), and that the output instance has an rdfs:label with a specific short error message, a dc:description with a longer error message, and an rdfs:comment with technical details.

The service description can then be extended to include potential error classes. Any instances that belong to that error class have not been processed successfully. This won't introduce any new semantics, the service can use whatever error hierarchy it likes, and the client can figure out what went wrong pretty easily and know what to show the user, if that's needed. If the service wants to provide additional details, it can do so as part of the exception class definition.

Jim

Michel Dumontier

unread,
Mar 28, 2012, 12:28:25 AM3/28/12
to sadi...@googlegroups.com
I can add some classes in SIO, if that's convenient.

m.
--
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Visiting Associate Professor, Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Alexandre Riazanov

unread,
Mar 28, 2012, 11:01:28 AM3/28/12
to sadi...@googlegroups.com
On Wed, Mar 28, 2012 at 12:35 AM, Jim McCusker <james.m...@yale.edu> wrote:
On Tue, Mar 27, 2012 at 11:21 PM, Alexandre Riazanov <alexandre...@gmail.com> wrote:
Yes, unless you postulate in the spec that error message data is always expected as if the corresponding error description class is attached with a disjunction to the input class definition. Then some client programs can process the error messages properly. 

We probably don't need to go any further than state that a node with an error has a class that is disjoint from the output class (

How is this better than just having a fixed class for error descriptions?
Where do you state the disjointness of the classes? Are clients required, in general, to infer
the disjointness? 
 
not input, the node is already supposed to belong to the input class), and that the output instance has an rdfs:label with a specific short error message, a dc:description with a longer error message, and an rdfs:comment with technical details.

The service description can then be extended to include potential error classes. Any instances that belong to that error class have not been processed successfully. This won't introduce any new semantics, the service can use whatever error hierarchy it likes, and the client can figure out what went wrong pretty easily
and know what to show the user, if that's needed. If the service wants to provide additional details, it can do so as part of the exception class definition.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.m...@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mcc...@cs.rpi.edu
http://tw.rpi.edu

Jim McCusker

unread,
Mar 28, 2012, 12:29:50 PM3/28/12
to sadi...@googlegroups.com
On Wed, Mar 28, 2012 at 11:01 AM, Alexandre Riazanov <alexandre...@gmail.com> wrote:
How is this better than just having a fixed class for error descriptions?

There are no fixed classes in SADI (outside of the service description). We should avoid having them if we can, because any fixed class will end up not meeting the needs of someone. The client is already expected to infer what it needs to to produce a valid input, it should also be checking types on output.

For instance, I'm working on some generic services that manage linked data, so I'm doing a lot of stuff with HTTP in RDF, which has classes and response codes and so on. If a dereference fails, I will be able to use the 4xx and 5xx error codes as exceptions directly if there isn't a fixed class for errors.

I believe Michel would refer to this as minimal ontological commitment. :-)
 
Where do you state the disjointness of the classes? Are clients required, in general, to infer
the disjointness? 

It could be stated in the output response, or in the representation that comes from dereferencing the input class and/or error class URIs.
 
Jim

Artjom Klein

unread,
Jul 18, 2013, 4:39:56 PM7/18/13
to sadi...@googlegroups.com, Luke McCarthy
Hi Luke,

as I remember, if an exception is thrown by sadi service, the service returns an RDF with that exception. Now I see another behavior: the exception is thrown into logs and empty RDF is returned (just the output node with rdf:type OutputClass). Is it what it should be or am I doing something wrong?

-Artjom

Luke McCarthy

unread,
Jul 18, 2013, 4:50:25 PM7/18/13
to sadi...@googlegroups.com
No, if you throw an Exception from processInput, the service should return an exception with an HTTP error code. This code hasn't changed, even in the latest version of sadi-service.jar (you can follow the method hierarchy up to ServiceServlet.doPost and check for yourself; or AsynchronousServiceServlet.doGet, if your service is asynchronous and one of the InputProcessingTasks fails…)

Is there code in your service that overrides the default behaviour?

Artjom Klein

unread,
Jul 19, 2013, 1:05:19 PM7/19/13
to sadi...@googlegroups.com
Hi Luke,

sorry, it was my fault. Just overlooked that 'throw exception' was in 'try' block. That is why the exception was thrown into logs and empty RDF was returned by the service.


--
You received this message because you are subscribed to the Google Groups "sadi-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sadi-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages