Possible Errors in RDFAlerts online tool? Advocacy to get standard semantic web mime types includes in web frameworks and servers

24 views
Skip to first unread message

Peter DeVries

unread,
Apr 21, 2012, 6:18:09 PM4/21/12
to pedant...@googlegroups.com
Hi Pedantists,

I discovered this tool and used it to fix a number of errors in my live data but are not in the current RDF dump yet.


Here are some of my RDF that are useful tests



Although I agree with some of the errors this tool exposed, there are other messages that make me think that it is showing some false errors.

For instance:

error retrieving http://lod.taxonconcept.org/ontology/txn.owl - The host did not accept the connection within timeout of 3000 ms

I am able to get this ontology via curl from Woods Hole MA while the server itself is in Madison, WI

* About to connect() to lod.taxonconcept.org port 80 (#0)
*   Trying 144.92.198.22... connected
* Connected to lod.taxonconcept.org (144.92.198.22) port 80 (#0)
> GET /ontology/txn.owl HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Accept: */*
< HTTP/1.1 200 OK
< Date: Sat, 21 Apr 2012 21:57:35 GMT
< Server: nginx/1.0.6
< Content-Type: application/rdf+xml
< Content-Length: 170229
< Last-Modified: Thu, 15 Mar 2012 00:03:27 GMT
< Accept-Ranges: bytes
< MS-Author-Via: DAV
-- 

Most of my problems were because of my rails / nginx stack not correctly setting the content type. I think this might have happened when I moved from thin to passenger/nginx.

In either case, I am finding that the standard mime types for rdf, owl are not included in a number of web frameworks and I am wondering if some advocacy is needed to get them included?

Respectfully,

- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdev...@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------

Aidan Hogan

unread,
Apr 21, 2012, 11:54:42 PM4/21/12
to pedant...@googlegroups.com
Hi Pete,

> I discovered this tool and used it to fix a number of errors in my live
> data but are not in the current RDF dump yet.
>
> http://swse.deri.org/RDFAlerts/

I cobbled that tool together quite quickly a while back with help from a
few other fellow pedants. I'm glad to know it's still of use, but I fear
it is quite limited. In general, I would say that it is descriptive
rather than prescriptive. Oftentimes what it reports may not actually be
an error; esp. warnings can be ignored. Please take it as informative.

Not having the "bandwidth" to support the system or to maintain it
directly myself, the Sindice guys were kind enough to integrate support
into their engine. Their interface is available at:

http://inspector.sindice.com/

(Look for a validate feature, and a "pedantic" feature there.) As a
bonus, unlike the RDFAlerts tool, they have the means to do more
advanced conneg and support RDF syntaxes other than RDF/XML. As a trade
off, one or two minor checks could not be translated over, but all the
important stuff should be there. (I don't maintain this service; rather
the guys over at Sindice do.)

> Here are some of my RDF that are useful tests
>
> http://lod.taxonconcept.org/ses/v6n7p.rdf
>
> http://lod.taxonconcept.org/ses/mCcSp.rdf
>
> Although I agree with some of the errors this tool exposed, there are
> other messages that make me think that it is showing some false errors.
>
> For instance:
>
> error retrieving http://lod.taxonconcept.org/ontology/txn.owl - The host
> did not accept the connection within timeout of 3000 ms

Typically I would say that timeouts are a remote problem, not a problem
with our system. For example, I often notice timeouts for vocabularies
behind the purl.org domain (e.g., DC), but these documents often are
actually unresponsive.

However, I notice that the documents you link encounter a *lot* of
timeouts when checked, even for typically responsive vocabularies like
FOAF. I can't give you an answer now, but I will have a look. But yes,
it seems that the problem is on our end. In the meantime, maybe try the
more stable Sindice inspector above?

> I am able to get this ontology via curl from Woods Hole MA while the
> server itself is in Madison, WI
>
> curl -v http://lod.taxonconcept.org/ontology/txn.owl
> * About to connect() to lod.taxonconcept.org

> <http://lod.taxonconcept.org> port 80 (#0)
> * Trying 144.92.198.22... connected
> * Connected to lod.taxonconcept.org <http://lod.taxonconcept.org>


> (144.92.198.22) port 80 (#0)
>> GET /ontology/txn.owl HTTP/1.1
>> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4
> OpenSSL/0.9.8r zlib/1.2.5

>> Host: lod.taxonconcept.org <http://lod.taxonconcept.org>


>> Accept: */*
>>
> < HTTP/1.1 200 OK
> < Date: Sat, 21 Apr 2012 21:57:35 GMT
> < Server: nginx/1.0.6
> < Content-Type: application/rdf+xml
> < Content-Length: 170229
> < Last-Modified: Thu, 15 Mar 2012 00:03:27 GMT
> < Accept-Ranges: bytes
> < MS-Author-Via: DAV
> --

Yes, I would say the problem is on our end.

> Most of my problems were because of my rails / nginx stack not correctly
> setting the content type. I think this might have happened when I moved
> from thin to passenger/nginx.
>
> In either case, I am finding that the standard mime types for rdf, owl
> are not included in a number of web frameworks and I am wondering if
> some advocacy is needed to get them included?

I would guess that application/rdf+xml should be quite widely supported?
This would also cover the most common OWL syntax. Not sure about other
content types. Still I agree that there's lots more advocacy left to do,
in this aspect and various others.

Cheers,
Aidan

Aidan Hogan

unread,
Apr 22, 2012, 12:30:04 AM4/22/12
to pedant...@googlegroups.com
Hi Pete,

Automated tools aside, I should add that nothing beats the true pedantic
litmus test. Manual validation... which is why we have the list :)

I have to admit, I could only find minor nitpicky comments.

I should add that nothing beats the true pedan
> Here are some of my RDF that are useful tests
>
> http://lod.taxonconcept.org/ses/v6n7p.rdf

Have you considered more use of datatypes and language tags? For example,
* dcterms:modified maybe could use an xsd:dateTime (or xsd:dateTimeStamp
for OWL support).
* dcterms:identifier could use an xsd:anyURI perhaps?
* dcterms:description could use a lang tag; not sure if applicable to
other text values
* Without knowing the details, maybe owl:sameAs instead of
skos:closeMatch? I fear people may have been scared off owl:sameAs a
little prematurely in our community. (But great to see the links there!)

> http://lod.taxonconcept.org/ontology/txn.owl

* I generally prefer to have prose rdfs:labels rather than repeat the
URI's local name. But that's a personal preference.
* You seem to have lots of "key" values available to you, like

<txn:hasGBIF>2435099</txn:hasGBIF>
<txn:hasITIS>552479</txn:hasITIS>
<txn:hasEOL>311910</txn:hasEOL>
<txn:hasNCBI>9696</txn:hasNCBI>
<uniprot:scientificName>Puma concolor</uniprot:scientificName>
<txn:hasBOLD>12521</txn:hasBOLD>

which are great for interlinking datasets. Unfortunately, OWL (Direct
Semantics) is a little allergic to inverse-functional datatype
properties for some extremely esoteric reasons. If I were in your shoes,
I'd still go ahead and make them inverse-functional, as per FOAF. But if
anyone asks, I didn't say that.
* Again, great to see links to other vocabularies like wo:, etc.!

In general, with a quick scan, I think the data look good!

...

I just noticed one other error in the RDFAlerts tool that raised an eyebrow:

"instance of owl:ObjectProperty http://purl.org/dc/terms/format used
with literal value application/pdf"

I checked the DC Terms schemata and they define dct:format as a plain
rdf:Property. However, there seems to be quite a few ontologies out
there declaring dct:format to be an ObjectProperty, the most prominent
of which is BIBO [1] (which you use).

This is interesting. Many folks want to use standard OWL tools to
produce compliant vocabularies that, e.g., are a subset of OWL (2) DL.
To do this, they need to declare all properties as object or datatype
properties. DC does not make that distinction. So to include some DC
properties, one needs to say whether they will be using them as object
or datatype properties. If you select the datatype view and another
party selects the object view, you create incompatibilites.

I would not be so concerned (not the end of the world), but worth
knowing that BIBO takes a different view on dct:format to you.

Cheers,
Aidan

[1]
http://lod.openlinksw.com/sparql?default-graph-uri=&query=SELECT+*+WHERE+%7BGRAPH+%3Fg+%7B+dcterms%3Aformat+a+owl%3AObjectProperty+%7D%7D&should-sponge=&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=15000&debug=on

Peter DeVries

unread,
Apr 22, 2012, 11:57:08 PM4/22/12
to pedant...@googlegroups.com
Hi Aidan,

Thanks very much for the suggestions. I will work on implementing them.

One question is that it seems that there is no URI's for stating something is of a particular mime type?

To work around this I am considering the following solution.

Add something like this to the TaxonConcept vocabulary

     <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
     <rdf:type rdf:resource="http://purl.org/dc/terms/MediaType"/>
     <rdf:value>application/pdf</rdf:value>
     <rdfs:label>application/pdf</rdfs:label>
     <vs:term_status>testing</vs:term_status>
     <rdfs:isDefinedBy rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl"/>
  </skos:Concept>

And change http://lod.taxonconcept.org/ses/v6n7p.rdf in the following way (pdf URL abbreviated to avoid wrapping

    <rdfs:label>PDF of the Original Description of Puma concolor (Linnaeus 1771)</rdfs:label>
    <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/>
    <bibo:format rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#FormatPDF"/>
    <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/>
  </bibo:Document>

I suspect that the list might have a better idea?

Thanks,

- Pete
--

Andreas Harth

unread,
Apr 23, 2012, 6:00:37 AM4/23/12
to pedant...@googlegroups.com
Hi,

On 23/04/12 05:57, Peter DeVries wrote:
> One question is that it seems that there is no URI's for stating
> something is of a particular mime type?
>
> To work around this I am considering the following solution.

[...]

> I suspect that the list might have a better idea?

I'm also interested in best practices for modelling content types.

For GADM-RDF we currently used dct:format [1]:

<http://gadm.geovocab.org/id/0_10_geometry.rdf> dct:format
"application/rdf+xml" .

Cheers,
Andreas.

[1] http://gadm.geovocab.org/id/0_10_geometry

William Waites

unread,
Apr 23, 2012, 8:45:41 AM4/23/12
to pedant...@googlegroups.com, ha...@kit.edu
On Mon, 23 Apr 2012 12:00:37 +0200, Andreas Harth <ha...@kit.edu> said:

> <http://gadm.geovocab.org/id/0_10_geometry.rdf> dct:format
> "application/rdf+xml" .

I think the "proper" though somewhat cumbersome way of doing this is:

:foo dct:format [
a dct:IMT;
rdf:value "application/rdf+xml"
].

Cheers,
-w

Keith Alexander

unread,
Apr 23, 2012, 8:49:41 AM4/23/12
to pedant...@googlegroups.com, ha...@kit.edu
There are also the URIs for formats defined here:

Alexander Dutton

unread,
Apr 23, 2012, 8:53:57 AM4/23/12
to pedant...@googlegroups.com
On 23/04/12 13:49, Keith Alexander wrote:
> There are also the URIs for formats defined here:
>
> http://www.w3.org/ns/formats/

http://www.w3.org/2001/tag/2002/01-uriMediaType-9 is also relevant, but
as far as I can tell the IETF haven't acted upon that recommendation.

Yours,

Alexander

Andreas Harth

unread,
Apr 24, 2012, 5:05:28 AM4/24/12
to William Waites, pedant...@googlegroups.com
Hi William,
thanks! We've changed that (e.g., see [1]).

Cheers,
Andreas.

[1] http://gadm.geovocab.org/id/1_1032_geometry.rdf

Aidan Hogan

unread,
Apr 25, 2012, 7:11:16 AM4/25/12
to pedant...@googlegroups.com
Hi,

On 24/04/2012 10:05, Andreas Harth wrote:
> Hi William,
>
> On 23/04/12 14:45, William Waites wrote:
>> On Mon, 23 Apr 2012 12:00:37 +0200, Andreas Harth<ha...@kit.edu> said:
>>
>> > <http://gadm.geovocab.org/id/0_10_geometry.rdf> dct:format
>> > "application/rdf+xml" .
>>
>> I think the "proper" though somewhat cumbersome way of doing this is:
>>
>> :foo dct:format [
>> a dct:IMT;
>> rdf:value "application/rdf+xml"
>> ].

I think that since such patterns are very likely to be heavily re-used,
using a URI instead of a blank node would be a win for all involved.

Using Keith's suggestion, how about:

:foo dct:format <http://www.w3.org/ns/formats/RDF_XML> .

Those "formats:" URIs dereference to data including:

<rdf:Description rdf:about="http://www.w3.org/ns/formats/RDF_XML">
<dc:description>Unique identifier for the RDF serialization in XML
(RDF/XML)</dc:description>
<rdfs:comment>RDF/XML is defined by the RDF/XML Syntax
Specification</rdfs:comment>
<rdf:type rdf:resource="http://www.w3.org/ns/formats/Format"/>
<dc:creator rdf:resource="http://www.ivan-herman.net/foaf#me"/>
<formats:preferred_suffix>.rdf</formats:preferred_suffix>
<formats:media_type>application/rdf+xml</formats:media_type>
<dc:date>2010-05-04</dc:date>
<rdfs:isDefinedBy
rdf:resource="http://www.w3.org/TR/rdf-syntax-grammar/"/>
<rdfs:seeAlso
rdf:resource="http://www.w3.org/TR/rdf-syntax-grammar/#section-MIME-Type"/>
</rdf:Description>

Unfortunately there seems to be some encoding issues in these documents,
but I'm sure we can get them resolved...

Cheers,
Aidan




Andreas Harth

unread,
Apr 25, 2012, 8:57:44 AM4/25/12
to pedant...@googlegroups.com
Hi,

On 25/04/12 13:11, Aidan Hogan wrote:
> I think that since such patterns are very likely to be heavily re-used,
> using a URI instead of a blank node would be a win for all involved.
>
> Using Keith's suggestion, how about:
>
> :foo dct:format <http://www.w3.org/ns/formats/RDF_XML> .
>
> Those "formats:" URIs dereference to data including:
>
> ...

sounds good.

However, Ivan's file only coins URIs for common Semantic Web
media types.

I'd need a URI for each of the media types defined (well, for our NeoGeo
stuff it would suffice to have URIs for the geodata-related
media types - GML, KML, KMZ, WKT).

Any idea how to get URIs for all media types?

Also, we'd ideally also enable an inference in the form

?x dct:format concat(http://example.org/mt/, ?mt) <-
?s dct:format [ a dct:IMT;
rdf:value ?mt ] .

Best regards,
Andreas.

Dan Brickley

unread,
Apr 25, 2012, 9:01:39 AM4/25/12
to pedant...@googlegroups.com, ha...@kit.edu
There's also an old W3C TAG note on this,
http://www.w3.org/2001/tag/2002/01-uriMediaType-9 ... perhaps worth
checking in there for progress?

Dan

Richard Cyganiak

unread,
Apr 25, 2012, 9:29:43 AM4/25/12
to pedant...@googlegroups.com, ha...@kit.edu

The question of URIs for media types is one that comes up *all the time*.

This TAG Finding from 2002 recommends that IANA mint URIs for media types.

As far as I know, no progress has been made.

Does anyone have a contact in IANA who might give some insights into the obstacles or suggest how progress could be made?

Should we approach the TAG and ask them again to lobby IANA?

Best,
Richard

Dan Brickley

unread,
Apr 25, 2012, 9:41:51 AM4/25/12
to pedant...@googlegroups.com, Roessler Thomas, ha...@kit.edu
I think going via the TAG makes sense. However according to
http://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority#Oversight
this is managed thru ICANN so I've +cc:'d Thomas Roessler here, who
might be able to offer some insight. Thomas, do you know of any
initiatives in the IANA/ICANN world w.r.t. URIs for media types?

Dan

Simon Spero

unread,
Apr 25, 2012, 11:50:13 AM4/25/12
to pedant...@googlegroups.com
I would suggest taking a look at the Unified Digital Format Registry (UDFR).   It's the result of the merger of of two existing registries, designed to carry a more information than just mime types. 


The UDFR is a reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community.
A format is a set of semantic and syntactic rules governing the mapping between abstract information and its representation in digital form. While many worthwhile and necessary preservation activities can be performed on a digital asset without knowledge of its format, that is, merely as a sequence of bits, any higher-level preservation of the underlying information content must be performed in the context of the asset's format.
The UDFR seeks to "unify" the function and holdings of two existing registries, PRONOM and GDFR (the Global Digital Format Registry), in an open source, semantically enabled, and community supported platform.
The UDFR was developed by the University of California Curation Center (UC3) at the California Digital Library (CDL), funded by the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP). The service is implemented on top of the OntoWiki semantic wiki and Virtuoso triple store.


application/rdf+xml is  http://udfr.org/udfr/u1r130 

Simon

Richard Cyganiak

unread,
Apr 25, 2012, 12:57:23 PM4/25/12
to pedant...@googlegroups.com
Hi Simon,

On 25 Apr 2012, at 16:50, Simon Spero wrote:
> I would suggest taking a look at the Unified Digital Format Registry (UDFR).

> application/rdf+xml is http://udfr.org/udfr/u1r130

How would I find out about this URI when starting from the string "application/rdf+xml"?

Is UDFR updated with new IANA registrations?

How long will UDFR exist?

UDFR's coverage of media types is somewhat incomplete, e.g., MathML and TEI are missing.

For web engineering purposes, I'd be more confident if IANA would assign URIs to the IANA-registered media types.

Many
Best,
Richard

Simon Spero

unread,
Apr 25, 2012, 1:16:38 PM4/25/12
to pedant...@googlegroups.com, slabr...@gmail.com, Lisa....@ucop.edu

Udfr is designed to be a long term preservation tool, so the uris should be around for a while :)

I believe there is a sparql endpoint (it's on top of virtuoso).

There's an open meeting presenting the work on may 4th at The Library of Congress- for details see below:

http://www.netpreserve.org/events/dc_ga/UDFR-community-meeting-2012-05-04.pdf

William Waites

unread,
Apr 25, 2012, 1:31:28 PM4/25/12
to pedant...@googlegroups.com
What do when we want to have a parametrized dct:format like "application/rdf+xml; charset=utf-16"... Do we do

[ dct:format [ rdf:value "..."; foo:charset "..." ] ]

And mint iris for every such combination?

Andreas Harth

unread,
Apr 25, 2012, 12:20:38 PM4/25/12
to pedant...@googlegroups.com
Hi,

On 25/04/12 17:50, Simon Spero wrote:
> application/rdf+xml is http://udfr.org/udfr/u1r130

sounds good.

However,

$ rapper "http://udfr.org/udfr/u1r130"
rapper: Parsing URI http://udfr.org/udfr/u1r130 with parser rdfxml
...
rapper: Failed to parse URI http://udfr.org/udfr/u1r130 rdfxml content
$

Poking around a bit at [1], I've stumbled across [2].

Looks like that'd do the trick for now.

+1 for getting official URIs via ICANN/IANA though.

Best regards,
Andreas.

[1] http://udfr.org/onto/onto.rdf
[2] http://purl.org/NET/mediatypes/

Aidan Hogan

unread,
Apr 25, 2012, 2:13:53 PM4/25/12
to pedant...@googlegroups.com
Could always have a more direct...

:resource dct:format x:application/rdf+xml .
:resource x:charset x:utf-16 .

or similar. (Specific terms are flexible.)

Cheers,
Aidan

lisa

unread,
Apr 26, 2012, 7:14:54 PM4/26/12
to Pedantic Web Group
Thanks, Simon!

We didn't have resources to lobby IANA, so decided to use what we
could find publicly to link to the newly minted UDFR identifiers.

In the UDFR registry we have 1,127 MIME types as defined from Appspot
(as of 2/22/12).
Appspot routinely scrapes from IANA using code from the mediatypes
Google Code project.

In addition, we added 71 MIME types as defined by PRONOM (not found in
Appspot).

UDFR was built as an open-source registry for file format information.
Please feel free to browse what is available at http://udfr.org/ontowiki/
. If you'd like to contribute to the registry, please register and
contribute! The registry maintains provenance at the triple level so
that people can understand the source of each assertion.

Regards,
Lisa



On Apr 25, 11:50 am, Simon Spero <sesunc...@gmail.com> wrote:
> I would suggest taking a look at the Unified Digital Format Registry
> (UDFR).   It's the result of the merger of of two existing registries,
> designed to carry a more information than just mime types.
>
> The UDFR is a reliable, publicly accessible, and sustainable knowledge base
>
> > of file format representation information for use by the digital
> > preservation community.
> > A format is a set of semantic and syntactic rules governing the mapping
> > between abstract information and its representation in digital form. While
> > many worthwhile and necessary preservation activities can be performed on a
> > digital asset without knowledge of its format, that is, merely as a
> > sequence of bits, any higher-level preservation of the underlying
> > information content must be performed in the context of the asset's format.
> > The UDFR seeks to "unify" the function and holdings of two existing
> > registries, PRONOM <http://www.nationalarchives.gov.uk/PRONOM> and GDFR<http://gdfr.info/> (the
> > Global Digital Format Registry), in an open source, semantically enabled,
> > and community supported platform.
> > The UDFR was developed by the University of California Curation Center (
> > UC3 <http://www.cdlib.org/uc3>) at the California Digital Library (CDL<http://www.cdlib.org/>),
> > funded by the Library of Congress <http://www.loc.gov/> as part of its
> > National Digital Information Infrastructure Preservation Program (NDIIPP<http://www.digitalpreservation.gov/>).
> > The service is implemented on top of the OntoWiki<http://ontowiki.net/Projects/OntoWiki> semantic
> > wiki and Virtuoso<http://www.openlinksw.com/dataspace/dav/wiki/Main/VOSRDFWP> triple
> > store.
>
> ------------------------------
>
> application/rdf+xml is  http://udfr.org/udfr/u1r130
>
> Simon
>

Aidan Hogan

unread,
Apr 27, 2012, 11:50:08 AM4/27/12
to pedant...@googlegroups.com
Hi Lisa,

> We didn't have resources to lobby IANA, so decided to use what we
> could find publicly to link to the newly minted UDFR identifiers.
>
> In the UDFR registry we have 1,127 MIME types as defined from Appspot
> (as of 2/22/12).
> Appspot routinely scrapes from IANA using code from the mediatypes
> Google Code project.
>
> In addition, we added 71 MIME types as defined by PRONOM (not found in
> Appspot).
>
> UDFR was built as an open-source registry for file format information.
> Please feel free to browse what is available at http://udfr.org/ontowiki/
> . If you'd like to contribute to the registry, please register and
> contribute! The registry maintains provenance at the triple level so
> that people can understand the source of each assertion.

Would it be possible to make the URIs return RDF/XML (or RDF in a
suitable syntax) when dereferenced? Unless I'm mistaken, they currently
do not dereference to RDF [1].

This would make the URIs much more usable for Linked Data applications.

Regards,
Aidan

[1]
http://idi.fundacionctic.org/vapour?uri=http%3A%2F%2Fudfr.org%2Fudfr%2Fu1r109&defaultResponse=dontmind&userAgent=vapour.sourceforge.net
Reply all
Reply to author
Forward
0 new messages