Linking to DBpedia

182 views
Skip to first unread message

Roland Wingerter

unread,
Apr 6, 2018, 5:59:20 AM4/6/18
to vocbench-user
Dear all,

in the thread "New vocabulary external resource" Armando Stellato explained how to link from VB3 to an external resource. I was astonished to see that the description of the linked resource will be fetched from the web and displayed in VocBench, which is a great feature.

When I played around with a small example project, linking concepts to DBpedia I noticed that the URIs taken from this DBpedia page work in VocBench only when the *domain* name is in lowercase. 
http://dbpedia.org/resource/Berlin // works
http://DBpedia.org/resource/Berlin // does not work in VB3

http://dbpedia.org/resource/Middle_East // works
http://DBpedia.org/resource/Middle_East // doesn't work in VB3

I wonder if this is a bug? When I enter these URLs into the browser the domain name works regardless of case (whereas case is important in the resource name).

Since release 3.7, DBpedia also provides "localized datasets that contain IRIs like http://xx.dbpedia.org/resource/Name, where xx is a Wikipedia language code and Name is taken from the source URL, http://xx.wikipedia.org/wiki/Name" (Cf. http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets#h434-3).

As an example the Wikipedia article https://de.wikipedia.org/wiki/Frankfurt_am_Main is http://de.dbpedia.org/resource/Frankfurt_am_Main in BDpedia. However, clicking on this URI in VB3 leads to an error message "datatype rdf:langString requires a language tag [line 9]".

As an aside, http://it.dbpedia.org/resource/Roma works just fine ;-)

Kind regards
Roland

Roland Wingerter

unread,
Apr 6, 2018, 6:01:55 AM4/6/18
to vocbench-user

Roland Wingerter

unread,
Apr 6, 2018, 10:11:48 AM4/6/18
to vocbench-user
Here are two more. examples. Both work in a browser and from a text editor, but only the second one works in VocBench:
http://dbpedia.org/resource/Semantic_Web
http://live.dbpedia.org/resource/Semantic_Web

Clicking on the first URI in VB3 leads to an error message "datatype rdf:langString requires a language tag". This one really puzzles me.

Kind regards
Roland

Manuel Fiorelli

unread,
Apr 6, 2018, 11:01:02 AM4/6/18
to Roland Wingerter, vocbench-user
Dear Roland, All

I recently encountered the error "datatype rdf:langString requires a language tag", and noticed that it doesn't always occur: there are some DBpedia resources that cause this exception and others that are displayed without problems. Looking at the exception more closely, I observed that it is thrown by Rio, the subsystem of RF4J dealing with parsing/serialization (input/output) of RDF data. My preliminary hypothesis was that when the exception is thrown, the serialization of the resource description contains some constructs that Rio considers illegal. However, I haven't already investigated further, in order to: i) identify such problematic constructs, ii) verify if it is possible to make Rio accept them.

Best Regards
Manuel

--
You received this message because you are subscribed to the Google Groups "vocbench-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vocbench-user+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vocbench-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/vocbench-user/582eba8b-dc15-4a3a-af2a-60e583a26804%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Manuel Fiorelli

Roland Wingerter

unread,
Apr 6, 2018, 11:57:43 AM4/6/18
to vocbench-user
Dear Manuel,

thank you very much for your reply. I hope you can identify the problem and solve it.

Kind regards
Roland


Am Freitag, 6. April 2018 17:01:02 UTC+2 schrieb Manuel Fiorelli:
Dear Roland, All

I recently encountered the error "datatype rdf:langString requires a language tag", and noticed that it doesn't always occur: there are some DBpedia resources that cause this exception and others that are displayed without problems. Looking at the exception more closely, I observed that it is thrown by Rio, the subsystem of RF4J dealing with parsing/serialization (input/output) of RDF data. My preliminary hypothesis was that when the exception is thrown, the serialization of the resource description contains some constructs that Rio considers illegal. However, I haven't already investigated further, in order to: i) identify such problematic constructs, ii) verify if it is possible to make Rio accept them.

Best Regards
Manuel
2018-04-06 16:11 GMT+02:00 Roland Wingerter <chun...@gmail.com>:
Here are two more. examples. Both work in a browser and from a text editor, but only the second one works in VocBench:
http://dbpedia.org/resource/Semantic_Web
http://live.dbpedia.org/resource/Semantic_Web

Clicking on the first URI in VB3 leads to an error message "datatype rdf:langString requires a language tag". This one really puzzles me.

Kind regards
Roland


Am Freitag, 6. April 2018 12:01:55 UTC+2 schrieb Roland Wingerter:
 

--
You received this message because you are subscribed to the Google Groups "vocbench-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vocbench-use...@googlegroups.com.



--
Manuel Fiorelli

Armando Stellato

unread,
Apr 6, 2018, 4:43:42 PM4/6/18
to Roland Wingerter, vocbench-user

Dear Roland,

 

It seems to be an issue in the dbpedia source. The RIO component mentioned by Manuel is not developed by us but is part of the RDF4J suite, and it is reporting a syntax error in that there is a non-legal literal value. We would in any case file a bug or provide the fix to the RDF4J team, but RIO seems to be just doing well its job.

 

I give you some more background to clarify what you experienced:

 

You said you had no problem on browsers, but if you access the dbpedia page from a web browser, which is normally configured for getting HTML, you will just get what the browser asked: a web page.

 

If you configure the browser to get RDF (in any serialization format), the browser won’t complain about its content, as it will just render its content in the best way it knows (which might be not that much “RDF-aware”).

For instance, if the browser is configured to accept RDFXML as a priority over HTML, you will get an RDFXML file, but the browser will just parse it as an XML file (that is, the serialization format carrying the content) in order to provide a rendering for it, and will mostly not bother about its content being legal RDF.

 

 

Now, some background on RDF:

In RDF 1.0 there were those so called “plain literal”, as opposed to “typed literals”.  They were literals without a datatype. They might be simple strings (e.g. “person”) or have an optional language tag (e.g. “person”@en)

 

In RDF1.1 all literals are assumed to have a datatype, and a literal simply serialized as

“a literal”

Would be assumed to be:

“a literal”^^xsd:string

 

While a literal with a language tag would be assumed to have the newly introduced rdf:langString as its datatype.

rdf:langString typed literals mandatorily have a language tag

 

 

Let’s move now to your examples: by resolving the first URI you reported (http://dbpedia.org/resource/Semantic_Web) on a browser configured for retrieving RDFXML, I found this entry:

 

<dbp:b rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#langString">no</dbp:b>

 

That is: an rdf:langString typed literal without a language tag.

 

This is the data that RIO didn’t digest that well ;-)

 

 

Maybe we can do something about that: as much as I remember, RIO can be configured to be less picky about the RDF format when parsing, and just get plain triples without validating the nature of literals with respect to known datatypes.

I see http://docs.rdf4j.org/javadoc/latest/?org/eclipse/rdf4j/rio/ParseErrorListener.html provides these configuration options.

 

We’ll look into it, we can probably instruct the RIO parser to just get all triples. How VB should deal with this inconsistent content is then another matter…

 

Kind Regards,

 

Armando

 

 

 

 

 

Roland Wingerter

unread,
Apr 9, 2018, 6:38:16 AM4/9/18
to vocbench-user
Dear Armando,

thank you very much for looking into the matter and for your explanations. I see now that VocBench is not the cause of the problem. And come to think of it, it is completely understandable that DBpedia is not free of errors and probably never will be (cf. http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html). Hope you can find a way how to deal with errors.

Kind regards
Roland

Manuel Fiorelli

unread,
Apr 27, 2018, 1:41:13 PM4/27/18
to Roland Wingerter, vocbench-user
Dear Roland and Armando,

I add further background. In VocBench3, HTTP lookup of Linked Data resources is implemented using the component RDFLoader provided by RDF4J. VocBench3 uses this component with its default configuration, in a manner similar to the following:

IRI resource = SimpleValueFactory.getInstance().createIRI("http://dbpedia.org/resource/Semantic_Web");
Model retrievedStatements = new LinkedHashModel();
RDFLoader rdfLoader = new RDFLoader(new ParserConfig(), SimpleValueFactory.getInstance());
StatementCollector statementCollector = new StatementCollector(retrievedStatements);
rdfLoader.load(new URL(resource.stringValue()), null, null, statementCollector);

The piece of code above (executed against RDF4J 2.2) throws the following exception:

Exception in thread "main" org.eclipse.rdf4j.rio.RDFParseException: datatype rdf:langString requires a language tag
    at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:442)
    at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.createLiteral(RDFParserHelper.java:242)
    at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.createLiteral(RDFParserHelper.java:70)
    at org.eclipse.rdf4j.rio.jsonld.JSONLDInternalTripleCallback.triple(JSONLDInternalTripleCallback.java:123)
    at org.eclipse.rdf4j.rio.jsonld.JSONLDInternalTripleCallback.call(JSONLDInternalTripleCallback.java:217)
    at com.github.jsonldjava.core.JsonLdProcessor.toRDF(JsonLdProcessor.java:504)
    at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:68)
    at org.eclipse.rdf4j.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:286)
    at org.eclipse.rdf4j.repository.util.RDFLoader.load(RDFLoader.java:197)
    at org.eclipse.rdf4j.repository.util.RDFLoader.load(RDFLoader.java:161)
    at org.example.DBpediaRDFLoaderTest1.main(DBpediaRDFLoaderTest1.java:24)
Caused by: java.lang.IllegalArgumentException: datatype rdf:langString requires a language tag
    at org.eclipse.rdf4j.model.impl.SimpleLiteral.<init>(SimpleLiteral.java:99)
    at org.eclipse.rdf4j.model.impl.AbstractValueFactory.createLiteral(AbstractValueFactory.java:118)
    at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.createLiteral(RDFParserHelper.java:235)
    ... 9 more

In the end, the exception is thrown by the constructor of the simple implementation of RDF Literals, which complaining about an rdf:langString without language tag. It seems to me that it is doing the right thing according to the standard (https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal), because a null language tag surely does not qualify as a non-empty language tag.

While this behavior is standard-compliant, it is unfortunate that the Resource View crashes on the examples you provided. Therefore, I worked on a "solution" to this problem. Essentially, I replaced the default ValueFactory with an implementation that uses the language tag "und" (undetermined) when an rdf:langString has no explicit language tag. This solution has a downside: what is shown by the Resource View is slightly different from the content returned by the original data provider. However, I think that the scope of this downside is limited by the fact that in case of "remote datasets", the resource view is read-only and its content is not meant (at least for now) to be processed further.

A better solution would be one that informs the users about the error (or its automatic recovery).

Best Regards

Manuel




To unsubscribe from this group and stop receiving emails from it, send an email to vocbench-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Manuel Fiorelli

Roland Wingerter

unread,
Apr 27, 2018, 4:35:16 PM4/27/18
to vocbench-user
Dear Manuel,

thank you very much for finding a solution. It is very much appreciated!

Kind regards
Roland


Am Freitag, 27. April 2018 19:41:13 UTC+2 schrieb Manuel Fiorelli:
Dear Roland and Armando,

I add further background. In VocBench3, HTTP lookup of Linked Data resources is implemented using the component RDFLoader provided by RDF4J. VocBench3 uses this component with its default configuration, in a manner similar to the following:
 
[snip]
 



--
Manuel Fiorelli
Reply all
Reply to author
Forward
0 new messages