RDF GeoSpatial Ontologies

151 views
Skip to first unread message

Juan Salas

unread,
May 9, 2011, 11:28:59 AM5/9/11
to pedantic-web
Hi everyone,

My name is Juan Salas (from the "Universidad Tecnológica Nacional" in Argentina) and I have recently been working on a vocabulary for representing GeoData in RDF along with Andreas Harth (Karlsruhe Institut für Technologie), Claus Stadler (LinkedGeoData.org, Universität Leipzig), Luis Vilches and Alexander De Leon (GeoLinkedData.es).

We have finished a preliminary specification of a vocabulary for representing geometries [1] and spatial relations [2], there are also examples and explanations in this document [3] and at the main site [4]. We would really appreciate any kind of feedback you could provide us (we will provide corresponding acknowledgements in further publications, of course). Also if you are interested in contributing to the project, help is always welcome.

Best wishes and thank you in advance,
Juan

William Waites

unread,
May 9, 2011, 11:57:27 AM5/9/11
to pedant...@googlegroups.com
Holà Juan,

This is very nice and I've been looking for a way to do something like
this. An important use case is missing from the spec as far as I can
tell. I want to be able to query a triplestore and get back, along
with other information that I might like, some WKT or GML or something
that OpenLayers understands. You can see an example of what I mean
here,

http://semantic.ckan.net/record/af0755a1-2841-4d51-8b16-c95fd921908b

this uses the OGC's transliteration to RDF that they write about in
their GeoSPARQL pdfmembersubmission.

You seem to be assuming that one would want to make a request for a
particular feature, whereas I think one would like to make a request
for a feature and related things in one go.

Is there much value in actually breaking out the internal structure of
the geometry into RDF? Are there graph-traversal type queries that one
would like to do that would be made easier by this, is it likely to be
useful to want to annotate, for example, one vertex inside a linear
right? Or is it more likely to want to do the standard types of
spatial queries in which case it might be better to have an opaque
blob in which case we already have a lot of tools that understand
these flavours of opaque blob?

There may be a reason to do this but I've thought about it and haven't
been able to come up with a use case other than "it's cool to
translate things into RDF". If there are good answers to this question
I'd love to hear them and think they should go near the top of the
specification to provide some context -- I agree that we need a
vocabulary for talking about geospatial things, I'm just unclear on
why you would want to take such a granular approach.

Anyhow, I think this could easily be solved in a compatible way by
borrowing something like asWKT from the GeoSPARQL along with the
corresponding datatype, WKTLiteral.

Cheers,
-w

* [2011-05-09 12:28:59 -0300] Juan Salas <jms...@gmail.com> �crit:

] Hi everyone,
]
] My name is Juan Salas (from the "Universidad Tecnol�gica Nacional" in


] Argentina) and I have recently been working on a vocabulary for representing

] GeoData in RDF along with Andreas Harth (Karlsruhe Institut f�r
] Technologie), Claus Stadler (LinkedGeoData.org, Universit�t Leipzig), Luis


] Vilches and Alexander De Leon (GeoLinkedData.es).
]
] We have finished a preliminary specification of a vocabulary for
] representing geometries [1] and spatial relations [2], there are also
] examples and explanations in this document [3] and at the main site [4]. We
] would really appreciate any kind of feedback you could provide us (we will
] provide corresponding acknowledgements in further publications, of course).
] Also if you are interested in contributing to the project, help is always
] welcome.
]
] Best wishes and thank you in advance,
] Juan
]
] [1] http://geovocab.org/geometry
] [2] http://geovocab.org/spatial
] [3] http://geovocab.org/doc/neogeo.html
] [4] http://geovocab.org/

--
William Waites <mailto:w...@styx.org>
http://river.styx.org/ww/ <sip:w...@styx.org>
F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45

Sean Gillies

unread,
May 9, 2011, 11:59:30 AM5/9/11
to pedant...@googlegroups.com

Hi Juan,

Should the Spatial ontology description read "A vocabulary for
specifying relations between features"? It currently reads "...
between geometries".

Regards,

--
Sean Gillies
Programmer
Institute for the Study of the Ancient World
New York University

Andreas Harth

unread,
May 10, 2011, 7:26:27 AM5/10/11
to pedant...@googlegroups.com
Hi William,

thanks for you comments!

On 05/09/2011 05:57 PM, William Waites wrote:
> This is very nice and I've been looking for a way to do something like
> this. An important use case is missing from the spec as far as I can
> tell. I want to be able to query a triplestore and get back, along
> with other information that I might like, some WKT or GML or something
> that OpenLayers understands. You can see an example of what I mean

You can do a SPARQL query, get the points and then convert them to the
format your visualisation requires.

Our vocabulary is tailored towards Linked Data, i.e., we want to make
use of web architecture principles, rather than requiring a SPARQL endpoint.

> Is there much value in actually breaking out the internal structure of
> the geometry into RDF? Are there graph-traversal type queries that one
> would like to do that would be made easier by this, is it likely to be
> useful to want to annotate, for example, one vertex inside a linear
> right? Or is it more likely to want to do the standard types of
> spatial queries in which case it might be better to have an opaque
> blob in which case we already have a lot of tools that understand
> these flavours of opaque blob?

Using web architecture, we can use any representation for geometries, that
is, a geometry is just a URI, and you can put at that URI any format you
want (rather than specifying just WKT and GML as serialisation formats in
RDF literals). You can, for example, have a geometry URI that returns
a KML file (or a file in Gauss�Kr�ger or whatever arcane format you have).
Having discussed with GIS people at the Southhampton VoCamp I got the
impression there were more syntaxes for geometries than attendees.

To give RDF natives also a way to write geometries we have an RDF variant
of geometries with RDF lists.

> There may be a reason to do this but I've thought about it and haven't
> been able to come up with a use case other than "it's cool to
> translate things into RDF". If there are good answers to this question
> I'd love to hear them and think they should go near the top of the
> specification to provide some context -- I agree that we need a
> vocabulary for talking about geospatial things, I'm just unclear on
> why you would want to take such a granular approach.

We wanted to be compatible with geo:Point (which is currently heavily
used). You could for example model the route of the Tour de France
via a list of DBpedia URIs of the towns the tour goes through.

We're still trying to work out the exact way of integration, though,
as we have a Feature/Geometry distinction which the W3C Geo Vocabulary
does not have.

> Anyhow, I think this could easily be solved in a compatible way by
> borrowing something like asWKT from the GeoSPARQL along with the
> corresponding datatype, WKTLiteral.

Having looked at the GeoSPARQL spec (which seems to be now online [1]),
I think we could provide some mappings once there's a vocabulary
definition at their namespace.

Best regards,
Andreas.

[1] http://www.w3.org/2011/02/GeoSPARQL.pdf

Peter DeVries

unread,
May 10, 2011, 8:57:42 PM5/10/11
to pedant...@googlegroups.com
Hi Juan,

I have currently been marking up example species occurrence records using GeoNames.

There is one issue with the geo:Point for biodiversity studies.

It is best practice to include some measure of radius. This would include the extent from the GPS reading that the observation was made as well as the GPS error.

this is often described as PointRadiusSpatialFit see http://wwold.gbif.org/prog/digit/Georeferencing

I extended the geo:Point in this vocabulary, it also includes the IETF proposal for a geo:urn  http://tools.ietf.org/html/rfc5870



You can see it in action in these examples http://ocs.taxonconcept.org/ocs/index.html

For example here is one record


I use these geoAreas for features that fall within a given Geonames feature.

In practice this produces data that looks like this in Sig.ma http://sig.ma/search?pid=e15ef704529f423326f09a106862f978

In also allows you to query for organism expected in a given feature based on observation records.

I am not wedded to the geoAreas but I needed something that was smaller than a geonames feature so that observations associated in a small area (usually a single GPS reading) are linked.

An alternative to the geoArea's is to create a set of URI's for each 1 or 10 meter area of the Earth's surface.

I would be interested in what you think of this approach or if you have any ideas for something better.

My data set is documented in CKAN http://ckan.net/package/taxonconcept

I have recently updated the data set and related vocabularies.

Respectfully,

- Pete
--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdev...@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------

William Waites

unread,
May 11, 2011, 4:22:44 AM5/11/11
to pedant...@googlegroups.com
* [2011-05-10 13:26:27 +0200] Andreas Harth <ha...@kit.edu> �crit:

] You can do a SPARQL query, get the points and then convert them to the
] format your visualisation requires.

Andreas, of course you can, but this is extra work and raises the bar
for including geo things in RDF descriptions. What I'm trying to get
at is whether the extra work is worth it. Generally we should be
trying to lower the bar for doing things with RDF not raise it.

] Our vocabulary is tailored towards Linked Data, i.e., we want to make


] use of web architecture principles, rather than requiring a SPARQL endpoint.

Right, so the example link that I gave doesn't use SPARQL actually it
just operates on a description of a resource (in this case a graph
containing a catalogue record and a dataset).

] Using web architecture, we can use any representation for geometries, that


] is, a geometry is just a URI, and you can put at that URI any format you
] want (rather than specifying just WKT and GML as serialisation formats in
] RDF literals). You can, for example, have a geometry URI that returns

] a KML file (or a file in Gauss?Kr�ger or whatever arcane format you have).

Maybe it is useful to give the Geometry its own distict URI so it can
be separately requested. Nothing that I'm doing prevents that, that's
why I did,

:foo dc:spatial [ a :Geometry; asWKT "POLYGON (...)" ].

instead of

:foo dc:spatial "POLYGON (...)".

] Having discussed with GIS people at the Southhampton VoCamp I got the


] impression there were more syntaxes for geometries than attendees.

Sure, but there are only a handful that are commonly used, WKT, GML
and KML.

] We wanted to be compatible with geo:Point (which is currently heavily


] used). You could for example model the route of the Tour de France
] via a list of DBpedia URIs of the towns the tour goes through.

That's an interesting use case that I hadn't thought of. Obviously a
point is easy to model and having to deal with two statements to make
it into some sort of useable representation isn't a burden.

But actually, I would expect that sooner or later the towns will cease
to be infinitessimally small and will grow enclosing polygons or at
least bounding boxes, and using a point for the centroid is one thing,
but adding dozens or hundreds of statements describing the towns
borders will mean many statements that are never considered
individually except to translate to another representation that our
geo tools (indexes, visualisations) can actually handle.

So I'd say that the route is a series of geometries, and as such it
makes sense to handle geometry collections explicitly because they're
useful this way, and points because they're simple and already in the
wild, but would still say that the more complex shapes in between are
best avoided explicitly materialising.

So we might write,

:TourDeFranceRoute a OrderedGeometryCollection (
[ a Geometry;
label "Paris";
centroid [ a Point; lat, long ];
asWKT "Something complicated that needn't be materialised"
],
[ a Geometry;
label "Nantes";
centroid [ a Point; lat, long ];
asWKT "Something else complicated"
],
...
)

But I say, "needn't be materialised" not mustn't because I don't think
what you're suggesting should be prevented, just that provision should
be made somehow for what I think is the far more common case.

] We're still trying to work out the exact way of integration, though,


] as we have a Feature/Geometry distinction which the W3C Geo Vocabulary
] does not have.

I think this is exactly right, and we do have this distinction but
don't need it in the/a geo vocabulary for RDF. The feature us
dbpedia:Paris, the geometry is the object of its dc:spatial. Done. If
something has a dc:spatial (or whatever other predicate) then it is a
"feature" in geographer's terms but that isn't really saying much.

] >Anyhow, I think this could easily be solved in a compatible way by


] >borrowing something like asWKT from the GeoSPARQL along with the
] >corresponding datatype, WKTLiteral.
]
] Having looked at the GeoSPARQL spec (which seems to be now online [1]),
] I think we could provide some mappings once there's a vocabulary
] definition at their namespace.

Yeah, a PDF with cut-and-pasted RDF/XML in and no actual RDF
description in the namespace. Gah. But the representation they
advocate is sane and pretty much a direct transliteration of standard
practice in the GIS (neo and paleo) world. It is also structured in
such a way that you could make some sort of a D2R mapping on top of an
existing spatial database like PostGIS or Oracle and (1) be able to
get sensible RDF out of it and (2) be able to use its spatial
predicates for complicated queries like contains, overlaps, etc.

Cheers,
-w

Juan Salas

unread,
May 11, 2011, 11:19:21 AM5/11/11
to Pedantic Web Group
Hi Sean,

Thanks for your email, you are right about the typo. However, this is
also quite a tricky subject, so it's good that you point it out. We
currently define spatial relations between features, but whether the
spatial relations should be defined at the feature or geometry level
is open to debate.

The problem is that sometimes a feature may have many geometries, and
a given spatial relation may not apply for all of them. By many
geometries I don't mean a composite geometry (e.g a MultiPolygon), but
different geometries such as different resolutions of the same
polygon. In this case a point may or may not be within a geometry
depending on which one you are looking at.

However, if you decide to represent only spatial relations and not
geometries, you would have to define empty geometry resources for the
features, just to define the spatial relations between them, which is
one of the reasons we currently define spatial relations between
features.

I think that this is an interesting topic, so it's good that you point
it out.

Best Regards,
Juan

On 9 mayo, 12:59, Sean Gillies <sean.gill...@gmail.com> wrote:

Juan Salas

unread,
May 11, 2011, 11:43:44 AM5/11/11
to Pedantic Web Group
Hi Peter,

We had originally based the vocabulary on the GML Simple Features
Profile as it covered most use cases and mapped directly to KML and
WKT. However, the use case you mention is very interesting and we will
consider including a measure of radius in the vocabulary.

As for your question regarding to the linked observations, hopefully
this will be queriable in the future. For now, in order to materialize
this relation (if I understood your problem correctly), I could think
of a property such as 'near' or 'within10m' that links the
observations to each other, as an alternative to the ones you
proposed.

Best Regards,
Juan


On 10 mayo, 21:57, Peter DeVries <pete.devr...@gmail.com> wrote:
> Hi Juan,
>
> I have currently been marking up example species occurrence records using
> GeoNames.
>
> There is one issue with the geo:Point for biodiversity studies.
>
> It is best practice to include some measure of radius. This would include
> the extent from the GPS reading that the observation was made as well as the
> GPS error.
>
> this is often described as PointRadiusSpatialFit seehttp://wwold.gbif.org/prog/digit/Georeferencing
>
> I extended the geo:Point in this vocabulary, it also includes the IETF
> proposal for a geo:urn  http://tools.ietf.org/html/rfc5870
>
> HTMLhttp://lod.taxonconcept.org/ontology/dwc_area.owl
>
> Doc    http://lod.taxonconcept.org/ontology/dwc_area_doc/index.html
>
> You can see it in action in these exampleshttp://ocs.taxonconcept.org/ocs/index.html
>
> For example here is one record
>
> HTMLhttp://ocs.taxonconcept.org/ocs/1de0579b-086f-456a-8dbe-89f32dfbee68....
> RDFhttp://ocs.taxonconcept.org/ocs/1de0579b-086f-456a-8dbe-89f32dfbee68.rdf
>
> I use these geoAreas for features that fall within a given Geonames feature.
>
> In practice this produces data that looks like this in Sig.mahttp://sig.ma/search?pid=e15ef704529f423326f09a106862f978
>
> In also allows you to query for organism expected in a given feature based
> on observation records.
>
> I am not wedded to the geoAreas but I needed something that was smaller than
> a geonames feature so that observations associated in a small area (usually
> a single GPS reading) are linked.
>
> An alternative to the geoArea's is to create a set of URI's for each 1 or 10
> meter area of the Earth's surface.
>
> I would be interested in what you think of this approach or if you have any
> ideas for something better.
>
> My data set is documented in CKANhttp://ckan.net/package/taxonconcept
> Email: pdevr...@wisc.edu
> TaxonConcept <http://www.taxonconcept.org/>  &
> GeoSpecies<http://about.geospecies.org/> Knowledge
> Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> --------------------------------------------------------------------------- -----------

Juan Salas

unread,
May 11, 2011, 12:18:54 PM5/11/11
to Pedantic Web Group
Hi William,

> You seem to be assuming that one would want to make a request for a
> particular feature, whereas I think one would like to make a request
> for a feature and related things in one go.

Sorry, I'm not sure what you mean by this, but if you could explain me
this part I would be happy to comment on it.

Regarding the level of detail of the vocabulary, you are right, it is
arguable whether a Point granularity is necessary. Sometimes it is, as
for explicitely defining shared borders between regions. But whether
it is worth the trouble or not is up to debate and it would be
interesting to get some feedback from the community on this matter.

As for the GeoSPARQL predicates ("asWKT" and "asGML"), one of the
problems is that a geometry is not "uniquely" represented by a black
box document contained in an RDF literal. For example, a KML document
my contain multiple styling attributes, or a GML document may be
extended with aditional information and still be a valid GML document.
This makes it harder to query them in my opinion.

Instead of using such predicates we use standard HTTP content
negotiation in order to get the different versions of the geometry.
This is also highly standarized and well supported, and by doing so we
keep the semantics of the geometry while supporting the serializations
understood by GIS. Besides, our specification supports other
potentially interesting formats (e.g. SVG, KML, GeoJSON, etc.) not
being limited to just WKT or GML.

Best Regards,
Juan

Peter DeVries

unread,
May 13, 2011, 5:20:18 PM5/13/11
to pedant...@googlegroups.com
Hi Juan,

I made up and example that I think demonstrates the utility of this method.

Below is a query for those observations from the TDWG 2010 BioBlitz in Woods Hole, MA USA.

The viewer facet is set to sort by Observer (the person who make the observation).

On the side I was able to select a subset of the geoArea's to display.

ScreenShots, the SPARQL Text and a link that runs the query on URIburner are available at this bit.ly URL http://bit.ly/m6c5hI

The link to the URIburner query uses MS Pivot which needs Sliverlight. For those who don't have this or don't want it I include screenshots of the PivotView.

My example observation records are included in my sitemap and void, but I have included the direct link occurrence record RDF dump itself below.


I would like use these examples to work out the best way to represent these kinds of records in RDF. If you are interested in trying them, I would welcome any suggestions.

Thanks,

- Pete
--
------------------------------------------------------------------------------------

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdev...@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------

Juan Salas

unread,
May 16, 2011, 1:00:19 PM5/16/11
to pedant...@googlegroups.com
Hi Pete,

We'll take the use into consideration when adding the measure of radius to the vocabulary, so those examples may prove to be useful to us. Our vocabulary could then be used to define a Circumference geometry to which an observation would be related, giving a measure of proximity.

By the way, to everyone interested in our vocabulary, there is a demo representation of the NUTS classification [1]. There we use our vocabulary to represent the geometries and HTTP Content Negotiation to provide alternative representations of them, such as GML and KML.

Some sample features are:


Features and geometries are represented separately in RDF but are shown together in HTML in order to make the representation more user-friendly.

You can access the alternative representations by asking for its corresponding MIME-Type. For example, if you want to get the KML representation using cURL from the command line you could do:

curl -L http://nuts.geovocab.org/id/DE_geometry -H "Accept: application/vnd.google-earth.kml+xml" -o DE_geometry.kml

Or if you want the GML file:

curl -L http://nuts.geovocab.org/id/DE_geometry -H "Accept: application/vnd.ogc.gml" -o DE_geometry.gml

So any suggestions about it or the vocabulary are very welcome and if you have any questions please don't hesitate to ask.

Best regards,
Juan


2011/5/13 Peter DeVries <pete.d...@gmail.com>

John Goodwin

unread,
May 17, 2011, 11:05:00 AM5/17/11
to pedant...@googlegroups.com
Hi Juan,

Good work. One thing - I wouldn't make 'equals' a subproperty of 'owl:sameAs'. Being spatially co-located does not imply equivalence. For example there are two administrative units in the UK 'The Greater London Authority' and 'London' which are distinct, but spatially co-located.

I've been working on extending the OS spatial relations ontology to include some more axioms to allow for more extensive spatial reasoning...


John
 
--

Homepage: http://www.johngoodwin.me.uk
Blog: http://johngoodwin225.wordpress.com
Personal URI: http://www.johngoodwin.me.uk/me

Isn't it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too? - Douglas Adams

From: Juan Salas <jms...@gmail.com>
To: pedantic-web <pedant...@googlegroups.com>
Sent: Monday, 9 May 2011, 16:28
Subject: [pedantic-web] RDF GeoSpatial Ontologies

Juan Salas

unread,
May 18, 2011, 11:11:02 AM5/18/11
to pedant...@googlegroups.com
Hi John,

Thank you for your comments. You are right about the 'equals' subproperty. Also, nice work with the reasoning over the OS spatial relations ontology.

Currently we have been working mostly on the geometry vocabulary but we plan to improve the spatial relations ontology shortly. For example, one of the open questions we had is how to map (if possible) the RCC8 relations' logic into OWL (e.g. the TPP relation is not straightforward) while keeping decidability.

Best regards,
Juan

2011/5/17 John Goodwin <got...@btopenworld.com>

William Waites

unread,
Jun 1, 2011, 9:36:11 AM6/1/11
to pedant...@googlegroups.com
Hello Juan,

I seem to have misplaced your reply to my message where you were
asking me what I was getting at when I said I thought generally that
the most common use case was to get a description of the thing rather
than the geometry as such.

What I meant was the item of interest is generally the *feature* not
the geometry. The feature is the subject of a dc:spatial link, and the
object is the geometry. Most of the description will likely be about
the feature and typically the geometry will want to be opaque, more
or less. So something like,

:Paris dc:spatial [
a Geometry;
asWKT "Polygon(...)"^^wkt;
asGML "<gml>..."^^gml;
].

Another reason for making the geometry opaque like this. Consider what
you would have to do to implement geo indexing so people could do queries like,

SELECT ?place WHERE {
?place dc:spatial [ contains "Point(...)"^^wkt ]
}

if you build a geo index, doing a contains or similar query is easy
enough, there are libraries for this. You just need an index. To build
the index it would be easy enough to recognise either the asWKT
predicate or the wkt datatype and add it to the index on insert. Now
consider what you would have to do if the opaque geometry were
exploded into a potentially large set of triples. To add anything to
the index you would have to process a lot of them, and because we have
no particular ordering guarantees for e.g. loading a document into a
store you would have to keep track of all partial geometries until the
entire document was processed. Much messier and more resource
intensive...

Jo Walsh

unread,
Jun 1, 2011, 9:57:41 AM6/1/11
to pedant...@googlegroups.com
I understood a recommendation was to use content negotiation to return KML etc...


phone: +441316502973

William Waites

unread,
Jun 1, 2011, 11:24:47 AM6/1/11
to pedant...@googlegroups.com
Right, but aiui kml represents a (set of) features, which contain geometries. I'm just arguing that the main thing of interest is the feature. And we already have these in rdf we just call them resources.

You don't have to break out the geometry into many triples to make the geometry part of kml. You can but if that's done to the exclusion of the more traditional "opaque " representation it makes things like building indices harder.


Jo Walsh <jo.w...@ed.ac.uk> a écrit :

Juan Salas

unread,
Jun 6, 2011, 10:26:59 AM6/6/11
to pedant...@googlegroups.com
Hi William,

First of all, thank you very much for the feedback, I really appreciate it. I think the key to the RDF representation is semantics. So we argue that we should keep the vocabulary as open (i.e. not opaque) as possible, and then it is a matter of implementation how it should be processed. For example, if you define a WKT datatype for representing the geometry, first of all, it would have to be supported by the triple store, and then it would probably be stored as WKB, so a conversion is still necessary when querying or updating the index. 

Also, GML and WKT are not fully compatible, and by making them opaque to the RDF representation you cannot set its equivalencies nor link its content to other resources, as is typical in Linked Data. For example, in the case of a country with an offshore island, you may want to link both geometries by defining a multipolygon for the country, where one of its polygon members is the polygon of the island.

So I think the key is to abstract the meaning of the content being represented in other formats, and share it in a "neutral" way, which can be interpreted in a way that suits the application (e.g. GML, WKT, SVG, HTML, etc.). For convenience, we recommend to use HTTP content negotiation, as Jo pointed out, in order to provide these alternative representations. Otherwise, the expresivity of the vocabulary may fall short in current or future applications.

Best regards,
Juan


2011/6/1 William Waites <w...@styx.org>

Frans Knibbe

unread,
Jun 16, 2011, 7:21:14 AM6/16/11
to Pedantic Web Group
Hello,

I am glad I found NeoGeo and this discussion thread. Both seem to be
just what I was looking for. I am about to publish a data set
consisting of millions of triples. But I am still looking for the
right way to code/serialise geometry.

Well, here is my feedback:

Like William, I have my doubts about exploding geometry in sequences
of basic geo points. I do understand it is nice to extend the existing
basic geo vocabulary, but I don’t think it is very practical. It is my
impression that in almost all real life cases we treat geometries as
atomic entities. Especially when geometry is in transit, i.e. being
exchanged between data stores. But I realise that RDF can be a storage
format as well as an exchange format. At this time, the majority of
geographical data are probably stored in relational databases that
support spatial data types. In that case geometries are transformed
from RDF to something else. But it could very well be that in the
future geospatial data will more and more be stored in triple stores.
In that case the the exploded geometries could serve as a storage
format too (and be used directly by spatial functions).

I see one other benefit of exploding the geometry: for each coordinate
pair it is always clear which value is latitude and which is
longitude. This can prevent a lot of confusion, because not everybody
is aware of the y,x axis order in WGS84.

I do like the idea from the GeoSPARQL proposal of having the WKT data
type (thanks for posting the link to the PDF, by the way). It seems to
me WKT is the most useful serialisation of geometry. Others might
prefer GML or KML (but those two are more than geometry serialisation
formats). Why not have the vocabulary support multiple expressions of
geometry (of which series of basic geo point is one)?

Are the authors of NeoGeo in direct correspondence with the GeoSPARQL
people? In the end the world needs one vocabulary for spatial data,
right?

I don’t think I understand the concept of using content negotiation
for requesting geometry as WKT or GML. When I do a HTTP request I will
typically be returned a data set, not a single geometry. Why use the
HTTP headers to specify the format of only a subset of the data?

One thing I miss in the vocabulary is the notion of level of detail. A
spatial thing could have many geometries, each associated with a
certain level of detail (or generalisation levels). I think an
indication of level of detail could be an optional but integral part
of a geometry. It tells us something about how the geometry should be
interpreted. This is all the more important because I have yet to see
an example of coordinate values really having the number of
significant numbers that is right for their precision.

Regards,
Frans

Andreas Harth

unread,
Jun 20, 2011, 7:41:21 AM6/20/11
to pedant...@googlegroups.com
Dear Frans,

many thanks for your comments!

On 06/16/2011 01:21 PM, Frans Knibbe wrote:
> Well, here is my feedback:
>
> Like William, I have my doubts about exploding geometry in sequences
> of basic geo points. I do understand it is nice to extend the existing

> basic geo vocabulary, but I don�t think it is very practical. It is my


> impression that in almost all real life cases we treat geometries as
> atomic entities. Especially when geometry is in transit, i.e. being
> exchanged between data stores. But I realise that RDF can be a storage
> format as well as an exchange format. At this time, the majority of
> geographical data are probably stored in relational databases that
> support spatial data types. In that case geometries are transformed
> from RDF to something else. But it could very well be that in the
> future geospatial data will more and more be stored in triple stores.
> In that case the the exploded geometries could serve as a storage
> format too (and be used directly by spatial functions).
>
> I see one other benefit of exploding the geometry: for each coordinate
> pair it is always clear which value is latitude and which is
> longitude. This can prevent a lot of confusion, because not everybody
> is aware of the y,x axis order in WGS84.

you've mentioned the main point: RDF serves as exchange format, and as
such the exchanged data should be self-describing. Thus, we aim at a
generic way to encode geometries, independent of the storage model used.
We also just assume a Linked Data scenario without SPARQL (which could
be layered on top).

> I do like the idea from the GeoSPARQL proposal of having the WKT data
> type (thanks for posting the link to the PDF, by the way). It seems to
> me WKT is the most useful serialisation of geometry. Others might
> prefer GML or KML (but those two are more than geometry serialisation
> formats). Why not have the vocabulary support multiple expressions of
> geometry (of which series of basic geo point is one)?

That's exactly what we've proposed with the content negotiation scheme.
Rather than having a dedicated predicate for each possible geometry
serialisation format, we just treat the geometry as a URI. Upon lookup
on that URI, the client and server can negotiate which geometry format
to return.

> Are the authors of NeoGeo in direct correspondence with the GeoSPARQL
> people? In the end the world needs one vocabulary for spatial data,
> right?

Some GeoSPARQL people hang out at a mailing list [1] which has been
set up quite a while ago. I don't know details about the GeoSPARQL
standardisation process, but I am glad that the spec is finally public.

Having only two vocabularies for spatial data would be an excellent outcome,
as there are currently over a dozen.

> I don�t think I understand the concept of using content negotiation


> for requesting geometry as WKT or GML. When I do a HTTP request I will
> typically be returned a data set, not a single geometry. Why use the
> HTTP headers to specify the format of only a subset of the data?

Consider the URI representing the NUTS geometry of Iceland [2].

$ wget --header "Accept: application/rdf+xml"
"http://nuts.geovocab.org/id/IS_geometry"

returns the geometry in RDF/XML, while

$ wget --header "Accept: application/vnd.ogc.gml"
"http://nuts.geovocab.org/id/IS_geometry"

returns the geometry in GML. There is an issue with formats where files
can contain other descriptions except geometries, such as KML and GML, but
for now we just assume that the URI referenced returns one geometry. The
same issue applies to the geosparql:asGML predicate.

> One thing I miss in the vocabulary is the notion of level of detail. A
> spatial thing could have many geometries, each associated with a
> certain level of detail (or generalisation levels). I think an
> indication of level of detail could be an optional but integral part
> of a geometry. It tells us something about how the geometry should be
> interpreted. This is all the more important because I have yet to see
> an example of coordinate values really having the number of
> significant numbers that is right for their precision.

This being pedantic-web, do you have an example dataset online where you
need the precision? Actually, minting URIs for the geometries has the
added benefit that people can add descriptions (such as
:geometry ex:levelOfDetail "medium" .), which is tricky to accomplish
when using RDF Literals for geometries.

Best regards,
Andreas.

[1] http://groups.google.com/group/neogeo-semantic-web-vocabs
[2] http://nuts.geovocab.org/id/IS_geometry

William Waites

unread,
Jun 20, 2011, 8:50:29 AM6/20/11
to pedant...@googlegroups.com
* [2011-06-20 13:41:21 +0200] Andreas Harth <ha...@kit.edu> �crit:

] you've mentioned the main point: RDF serves as exchange format, and as


] such the exchanged data should be self-describing. Thus, we aim at a
] generic way to encode geometries, independent of the storage model used.
] We also just assume a Linked Data scenario without SPARQL (which could
] be layered on top).

Not really saying anything new here, apart from noting that what you
seem to want to do is define a less convenient encoding for geometries
than already exists (WKT, parts of GML, KML). Less convenient for
display, less convenient for indexing, and with little benefit other
than the "everything must be RDF" mantra.

Ad absurdum, we could do the same thing with integers. Instead of
using xsd:int, we could just define three numbers (minusone, zero,
one) and an addtion operator and instead of writing,

:foo :bar 42.

we could write,

:foo :bar [ plus one, [ plus one, [ ... ]]].

it's more general because then we can do other number systems, don't
have to worry about decimal encoding vs. hex or binary, etc. We could
even annotate individual steps in the sequence, tagging prime numbers
or even better writing out all of their prime factors.

And as I've mentioned before, it's all well and good to hand-wave
about "SPARQL being layerd on top" but seriously, try indexing
something like what you are suggesting. Spatial indexing is complex
enough without also throwing in some graph traversal (and
interpretation) to find out what you're indexing.

To be absolutely clear, what you are suggesting is a bad idea.

Frans Knibbe

unread,
Jun 21, 2011, 5:52:22 AM6/21/11
to Pedantic Web Group
Hello Andreas,

Thanks for your reply. Below are my responses ...

Regards,
Frans

On 20 jun, 13:41, Andreas Harth <ha...@kit.edu> wrote:

> you've mentioned the main point: RDF serves as exchange format, and as
> such the exchanged data should be self-describing.  Thus, we aim at a
> generic way to encode geometries, independent of the storage model used.
> We also just assume a Linked Data scenario without SPARQL (which could
> be layered on top).

I thought that WKT is also a general way of encoding geometry. WKT is
independent of the storage model, as far as I know. I also get the
impression that it is the most widely supported format for encoding
geometry. Even if software does not directly support it, it is very
easy to transform.

This may be comparable to the combination of arabic numerals and the
decimal system being the most common way of encoding numbers.

If you also think that RDF is mostly an exchange format, doesn't that
mean that there is no need to be able to reference individual points
that constitute a geometry?

>
> > I do like the idea from the GeoSPARQL proposal of having the WKT data
> > type (thanks for posting the link to the PDF, by the way). It seems to
> > me WKT is the most useful serialisation of geometry. Others might
> > prefer GML or KML (but those two are more than geometry serialisation
> > formats). Why not have the vocabulary support multiple expressions of
> > geometry (of which series of basic geo point is one)?
>
> That's exactly what we've proposed with the content negotiation scheme.
> Rather than having a dedicated predicate for each possible geometry
> serialisation format, we just treat the geometry as a URI.  Upon lookup
> on that URI, the client and server can negotiate which geometry format
> to return.

I wonder if it is common to make separate requests for all feature
attributes. I would rather think one makes a request that returns a
collection of features.

> > Are the authors of NeoGeo in direct correspondence with the GeoSPARQL
> > people? In the end the world needs one vocabulary for spatial data,
> > right?
>
> Some GeoSPARQL people hang out at a mailing list [1] which has been
> set up quite a while ago.  I don't know details about the GeoSPARQL
> standardisation process, but I am glad that the spec is finally public.
>
> Having only two vocabularies for spatial data would be an excellent outcome,
> as there are currently over a dozen.

So I have noticed. It is rather difficult to find the 'right' way to
encode geometry as RDF at the moment. So your initiative is
applaudable. Still, having only one standard would be even better. Or
do you think the two approaches have different use cases? I did notice
GeoSPARQL also supports using different coordinate reference systems.

> > I don t think I understand the concept of using content negotiation
> > for requesting geometry as WKT or GML. When I do a HTTP request I will
> > typically be returned a data set, not a single geometry. Why use the
> > HTTP headers to specify the format of only a subset of the data?
>
> Consider the URI representing the NUTS geometry of Iceland [2].
>
> $ wget --header "Accept: application/rdf+xml"
> "http://nuts.geovocab.org/id/IS_geometry"
>
> returns the geometry in RDF/XML, while
>
> $ wget --header "Accept: application/vnd.ogc.gml"
> "http://nuts.geovocab.org/id/IS_geometry"
>
> returns the geometry in GML.  There is an issue with formats where files
> can contain other descriptions except geometries, such as KML and GML, but
> for now we just assume that the URI referenced returns one geometry.  The
> same issue applies to the geosparql:asGML predicate.

As I wrote above, I wonder if making separate requests for all
geometry attributes is a common way of doing things. I don't have much
experience in the field of Linked Data, but I imagine that a typical
request is for a feature, or for a collection of features. I that
case, a geometry would only be one of the many items in the result
set. Would it make sense to use this kind of content negotiation in
such a case?

>
> > One thing I miss in the vocabulary is the notion of level of detail. A
> > spatial thing could have many geometries, each associated with a
> > certain level of detail (or generalisation levels). I think an
> > indication of level of detail could be an optional but integral part
> > of a geometry. It tells us something about how the geometry should be
> > interpreted. This is all the more important because I have yet to see
> > an example of coordinate values really having the number of
> > significant numbers that is right for their precision.
>
> This being pedantic-web, do you have an example dataset online where you
> need the precision?  Actually, minting URIs for the geometries has the
> added benefit that people can add descriptions (such as
> :geometry ex:levelOfDetail "medium" .), which is tricky to accomplish
> when using RDF Literals for geometries.

No, sorry, I don't have an example dataset online (yet). But the need
for such a thing rather comes from the demand side, I think. A
scenario could be a web mapping application that allows a user to add
RDF data to a map. The application is only interested in those
coordinates that it can display. Having too many coordinates would
only slow things down. In that case the application could request
features having geometries of just the right generalisation level.

There are many ways to express something like the level of detail of a
geometry, so I think it would be nice if it is an optional but
standardised property of a geometry.

Perhaps this issue is very similar to the need to specify a radius
with each point, which is something discussed elsewhere in this
thread. In a sense, the issue in both cases is coupling some kind of
specification of accuracy to the coordinates.

Andreas Harth

unread,
Jun 21, 2011, 11:30:09 AM6/21/11
to pedant...@googlegroups.com
Hi Frans,

On 06/21/2011 11:52 AM, Frans Knibbe wrote:
> If you also think that RDF is mostly an exchange format, doesn't that
> mean that there is no need to be able to reference individual points
> that constitute a geometry?

you can use both; if you want easy publishing of existing data, serve
WKT with content negotiation. If you want to integrate into geo:Point,
serve data in the ngeo: RDF format with content negotiation.

The referencing comes into play when you consider linking. Naming
individual points with URIs gives the ability to link and integrate data
more easily.

> I wonder if it is common to make separate requests for all feature
> attributes. I would rather think one makes a request that returns a
> collection of features.

Deciding where a description of a URI stops is a general issue on the
Linked Data web (the so-called "Decker problem" [1]).

In NeoGeo we took the decision to cleanly separate Feature and Geometry
and give both URIs.

>> Having only two vocabularies for spatial data would be an excellent outcome,
>> as there are currently over a dozen.
>
> So I have noticed. It is rather difficult to find the 'right' way to
> encode geometry as RDF at the moment. So your initiative is
> applaudable. Still, having only one standard would be even better. Or
> do you think the two approaches have different use cases? I did notice
> GeoSPARQL also supports using different coordinate reference systems.

I guess the GeoSPARQL proposal tackles geodata from a database-y perspective
(and the name seems to imply that SPARQL is required), whereas the NeoGeo
effort is geared towards publishing geo data as Linked Data. We see the
problem of querying via SPARQL as a problem that should be solved on another
layer.

> As I wrote above, I wonder if making separate requests for all
> geometry attributes is a common way of doing things. I don't have much
> experience in the field of Linked Data, but I imagine that a typical
> request is for a feature, or for a collection of features. I that
> case, a geometry would only be one of the many items in the result
> set. Would it make sense to use this kind of content negotiation in
> such a case?

Depends. If you have the URI lookup infrastructure as in Linked Data,
I think it's ok to traverse the data graph and perform multiple lookups
to get the data you want. The benefit of using URIs for things is that
you can identify and reference things from anywhere on the web. If you
want to have both Feature and Geometries in a single file, you can
use # URIs (which would, however, in NeoGeo require to use the RDF version
of the geometry, unless we would allow to use geosparql:asWKT; I don't
know if that would make sense from the point of view of the vocabulary
as a lookup on the OCG namespace document [2] results in a "Not found").

If you just want to have the ability to send around RDF files, then
putting the geometries into Literals is ok. Please note, however, that
I don't see how you would have multiple geometries for the same feature,
or how you would add additional triples describing the geometries.

> No, sorry, I don't have an example dataset online (yet). But the need
> for such a thing rather comes from the demand side, I think. A
> scenario could be a web mapping application that allows a user to add
> RDF data to a map. The application is only interested in those
> coordinates that it can display. Having too many coordinates would
> only slow things down. In that case the application could request
> features having geometries of just the right generalisation level.
>
> There are many ways to express something like the level of detail of a
> geometry, so I think it would be nice if it is an optional but
> standardised property of a geometry.

Right now we'd like to keep the vocabulary rather compact. We still need
to clean up the notion of spatial relations (should they hold between
Features? between Geometries? between both?) and I'd like to get that one
right before adding too many additional constructs.

Best regards,
Andreas.

[1] http://lists.w3.org/Archives/Public/public-sws-ig/2004Feb/0037.html
[2] http://www.opengis.net/rdf#

Andreas Harth

unread,
Jun 21, 2011, 9:59:48 AM6/21/11
to pedant...@googlegroups.com
Hi William,

On 06/20/2011 02:50 PM, William Waites wrote:
> it's more general because then we can do other number systems, don't
> have to worry about decimal encoding vs. hex or binary, etc. We could
> even annotate individual steps in the sequence, tagging prime numbers
> or even better writing out all of their prime factors.

going into the other direction of the spectrum, let's use Oracle's
binary index format to exchange geometries. Much more efficient,
as the index build phase is not necessary any more.

As often I think you have to make a trade-off decision. GeoSPARQL
is for easy to use for people with a traditional database perspective,
whereas the NeoGeo vocabulary is more geared towards the Linked Data web.

> And as I've mentioned before, it's all well and good to hand-wave
> about "SPARQL being layerd on top" but seriously, try indexing

Run a crawler, collect the data, index locally. Works for search engines
and data warehouses. And data publishers don't have to arse around
with a SPARQL endpoint.

> something like what you are suggesting. Spatial indexing is complex
> enough without also throwing in some graph traversal (and
> interpretation) to find out what you're indexing.

You are free to use WKT with NeoGeo (the point being that everybody
can use their favourite format).

:karlsruhe a spatial:Feature .
:karlsruhe ngeo:geometry :kageo .

and upon lookup on :kageo you return WKT as plain text (if you use
the right content type in the Accept header).

* uses WKT - check
* adheres to Linked Data principles - check
* supports multiple geometries for the same Feature - check
* possible to attach more triples (copyright, level of detail) - check

I don't see what's wrong.

> To be absolutely clear, what you are suggesting is a bad idea.

I was always wondering why none of the W3C geo working groups was
able to provide a spec...

Best regards,
Andreas.

William Waites

unread,
Jun 22, 2011, 9:26:08 AM6/22/11
to pedant...@googlegroups.com
* [2011-06-21 15:59:48 +0200] Andreas Harth <ha...@kit.edu> �crit:

] going into the other direction of the spectrum, let's use Oracle's


] binary index format to exchange geometries. Much more efficient,
] as the index build phase is not necessary any more.
]
] As often I think you have to make a trade-off decision.

Indeed. That's why we don't use BCD for integers.

] GeoSPARQL


] is for easy to use for people with a traditional database perspective,
] whereas the NeoGeo vocabulary is more geared towards the Linked Data web.

Vague hand-waving.

] Run a crawler, collect the data, index locally. Works for search engines


] and data warehouses. And data publishers don't have to arse around
] with a SPARQL endpoint.

I never said publishers have to arse around with a SPARQL endpoint.
Just that whatever they publish should be convenient for reading and
indexing - indexing can be for a triplestore or anything else the
problem is the same.

One use case I expect to be very common is, download a big dump of
data and do something with it.

] You are free to use WKT with NeoGeo (the point being that everybody


] can use their favourite format).
]
] :karlsruhe a spatial:Feature .
] :karlsruhe ngeo:geometry :kageo .

I expect very often we will actually have,

:karlsruhe ngeo:geometry [ ... something short ].

I know I do, in real, live, published data.

Arbitrarily saying "cannot use blank node, must use URI" is not, I
suspect, going to fly. And it means 2x HTTP requests. If the client
wants to pull together a lot of such things that adds up. I'm not
saying mustn't use URIs, I'm saying the spec should be silent on this.

] and upon lookup on :kageo you return WKT as plain text (if you use


] the right content type in the Accept header).
]
] * uses WKT - check
] * adheres to Linked Data principles - check
] * supports multiple geometries for the same Feature - check
] * possible to attach more triples (copyright, level of detail) - check
]
] I don't see what's wrong.

1. you require using URIs where otherwise blank nodes would be perfectly
fine and natural to use
2. clients of the service have to arse around figuring out what sorts
of geometry representation it supports
3. bulk indexing of the data is inconvenient
4. publishers have no clear guidance on how to do things, "any way you
want" (leads back to 2)

So if you had language that said something like,

"In order to be compliant with this spec, publishers SHOULD make
geometries available using the literal technique (pick one, WKT
makes sense but is not the only reasonable choice). In addition
publishers MAY make geometries available in other formats, including
"deep linked data" via content-negotiation or otherwise"

] >To be absolutely clear, what you are suggesting is a bad idea.


]
] I was always wondering why none of the W3C geo working groups was
] able to provide a spec...

That's a very strange thing to say. Whatever the reasons I'm pretty
certain they had nothing to do with me saying that I thought your
proposal as it stands is problematic.


Cheers,

Andreas Harth

unread,
Jun 22, 2011, 11:10:06 AM6/22/11
to pedant...@googlegroups.com
Hi William,

On 06/22/2011 03:26 PM, William Waites wrote:
> ] GeoSPARQL
> ] is for easy to use for people with a traditional database perspective,
> ] whereas the NeoGeo vocabulary is more geared towards the Linked Data web.
>
> Vague hand-waving.

precise statement: GeoSPARQL is not adhering to Linked Data principles as
I cannot lookup the namespace [1].

> I never said publishers have to arse around with a SPARQL endpoint.
> Just that whatever they publish should be convenient for reading and
> indexing - indexing can be for a triplestore or anything else the
> problem is the same.

You should think about renaming your vocabulary then, GeoSPARQL as a
name is slightly misleading.

> I know I do, in real, live, published data.

Same, see [2].

> Arbitrarily saying "cannot use blank node, must use URI" is not, I
> suspect, going to fly. And it means 2x HTTP requests. If the client
> wants to pull together a lot of such things that adds up. I'm not
> saying mustn't use URIs, I'm saying the spec should be silent on this.

If you use RDF lists for encoding the geometries only, using just a blank
node is fine.

> ] and upon lookup on :kageo you return WKT as plain text (if you use
> ] the right content type in the Accept header).
> ]
> ] * uses WKT - check
> ] * adheres to Linked Data principles - check
> ] * supports multiple geometries for the same Feature - check
> ] * possible to attach more triples (copyright, level of detail) - check
> ]
> ] I don't see what's wrong.
>
> 1. you require using URIs where otherwise blank nodes would be perfectly
> fine and natural to use
> 2. clients of the service have to arse around figuring out what sorts
> of geometry representation it supports
> 3. bulk indexing of the data is inconvenient
> 4. publishers have no clear guidance on how to do things, "any way you
> want" (leads back to 2)

There is the encoding of geometries using RDF lists of geo:Points for
RDF-native representation.

We probably won't come to an agreement here. You seem to focus on ease of use,
while I'd like to make data integration and linking (yes, of individual points)
possible and be flexible with the allowed encoding formats for geometries.

In any case, I think the discussion has been fruitful in that we were able
to tease out the differences.

Best regards,
Andreas.

[1] http://www.opengis.net/rdf#
[2] http://nuts.geovocab.org/

John Goodwin

unread,
Jun 27, 2011, 6:06:24 AM6/27/11
to pedant...@googlegroups.com

This was discussed lots at the NeoGeoVocamp. I have be be honest and say I'm closer to William's way of thinking (and this is reflected in the way I encoded geometries in the Ordnance Survey linked data). For me geometries are just datatypes/literals like dates, strings, integers and it would be nice to think that one day they will be treated as such by all triplestores. GeoSPARQL will help this hopefully. I can honestly think of no usecases for wanting to traverse the geometry stored in the graph...and I have though about this lots.

Reply all
Reply to author
Forward
0 new messages