> I discovered this tool and used it to fix a number of errors in my live
> data but are not in the current RDF dump yet.
I cobbled that tool together quite quickly a while back with help from a
few other fellow pedants. I'm glad to know it's still of use, but I fear
it is quite limited. In general, I would say that it is descriptive
rather than prescriptive. Oftentimes what it reports may not actually be
an error; esp. warnings can be ignored. Please take it as informative.
Not having the "bandwidth" to support the system or to maintain it
directly myself, the Sindice guys were kind enough to integrate support
into their engine. Their interface is available at:
(Look for a validate feature, and a "pedantic" feature there.) As a
bonus, unlike the RDFAlerts tool, they have the means to do more
advanced conneg and support RDF syntaxes other than RDF/XML. As a trade
off, one or two minor checks could not be translated over, but all the
important stuff should be there. (I don't maintain this service; rather
the guys over at Sindice do.)
> Here are some of my RDF that are useful tests
> Although I agree with some of the errors this tool exposed, there are
> other messages that make me think that it is showing some false errors.
> For instance:
> error retrieving http://lod.taxonconcept.org/ontology/txn.owl - The host
> did not accept the connection within timeout of 3000 ms
Typically I would say that timeouts are a remote problem, not a problem
with our system. For example, I often notice timeouts for vocabularies
behind the purl.org domain (e.g., DC), but these documents often are
However, I notice that the documents you link encounter a *lot* of
timeouts when checked, even for typically responsive vocabularies like
FOAF. I can't give you an answer now, but I will have a look. But yes,
it seems that the problem is on our end. In the meantime, maybe try the
more stable Sindice inspector above?
> I am able to get this ontology via curl from Woods Hole MA while the
> server itself is in Madison, WI
> curl -v http://lod.taxonconcept.org/ontology/txn.owl
> * About to connect() to lod.taxonconcept.org
> <http://lod.taxonconcept.org> port 80 (#0)
> * Trying 184.108.40.206... connected
> * Connected to lod.taxonconcept.org <http://lod.taxonconcept.org>
> (220.127.116.11) port 80 (#0)
>> GET /ontology/txn.owl HTTP/1.1
>> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4
> OpenSSL/0.9.8r zlib/1.2.5
>> Host: lod.taxonconcept.org <http://lod.taxonconcept.org>
>> Accept: */*
> < HTTP/1.1 200 OK
> < Date: Sat, 21 Apr 2012 21:57:35 GMT
> < Server: nginx/1.0.6
> < Content-Type: application/rdf+xml
> < Content-Length: 170229
> < Last-Modified: Thu, 15 Mar 2012 00:03:27 GMT
> < Accept-Ranges: bytes
> < MS-Author-Via: DAV
Yes, I would say the problem is on our end.
> Most of my problems were because of my rails / nginx stack not correctly
> setting the content type. I think this might have happened when I moved
> from thin to passenger/nginx.
> In either case, I am finding that the standard mime types for rdf, owl
> are not included in a number of web frameworks and I am wondering if
> some advocacy is needed to get them included?
I would guess that application/rdf+xml should be quite widely supported?
This would also cover the most common OWL syntax. Not sure about other
content types. Still I agree that there's lots more advocacy left to do,
in this aspect and various others.
On 23/04/12 05:57, Peter DeVries wrote:
> One question is that it seems that there is no URI's for stating
> something is of a particular mime type?
> To work around this I am considering the following solution.
> I suspect that the list might have a better idea?
I'm also interested in best practices for modelling content types.
For GADM-RDF we currently used dct:format :
> <http://gadm.geovocab.org/id/0_10_geometry.rdf> dct:format
> "application/rdf+xml" .
I think the "proper" though somewhat cumbersome way of doing this is:
:foo dct:format [
http://www.w3.org/2001/tag/2002/01-uriMediaType-9 is also relevant, but
as far as I can tell the IETF haven't acted upon that recommendation.
The question of URIs for media types is one that comes up *all the time*.
This TAG Finding from 2002 recommends that IANA mint URIs for media types.
As far as I know, no progress has been made.
Does anyone have a contact in IANA who might give some insights into the obstacles or suggest how progress could be made?
Should we approach the TAG and ask them again to lobby IANA?
The UDFR is a reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community.
A format is a set of semantic and syntactic rules governing the mapping between abstract information and its representation in digital form. While many worthwhile and necessary preservation activities can be performed on a digital asset without knowledge of its format, that is, merely as a sequence of bits, any higher-level preservation of the underlying information content must be performed in the context of the asset's format.
The UDFR seeks to "unify" the function and holdings of two existing registries, PRONOM and GDFR (the Global Digital Format Registry), in an open source, semantically enabled, and community supported platform.
The UDFR was developed by the University of California Curation Center (UC3) at the California Digital Library (CDL), funded by the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP). The service is implemented on top of the OntoWiki semantic wiki and Virtuoso triple store.
Udfr is designed to be a long term preservation tool, so the uris should be around for a while :)
I believe there is a sparql endpoint (it's on top of virtuoso).
There's an open meeting presenting the work on may 4th at The Library of Congress- for details see below: