Partitioning datasets 'arbitrarily'

5 views
Skip to first unread message

Rainer Simon

unread,
Jan 19, 2012, 5:37:23 AM1/19/12
to void-discussion
Dear list,

I'm a member of the Pelagios Project (http://pelagios-
project.blogspot.com) which aims to interlink resources in Ancient
World Research by means of aligning place references to a common
Gazetteer.

We intend to use VoID to express metadata about our individual
datasets. The most important aspect for us is being able to divide
sets into a hierarchy of subsets. void:subset seems to be the right
way to do this. However I'm having problems expressing how the subsets
then map to the triples in our datasets.

To give a concrete example: One of our datasets is a list of place
references in a digitized book collection. In slightly simplified
terms, this dataset is one large RDF dump containing a flat list of
statements expressing something like this:

<Book-Fragment-URI> <is a Reference To the Place> <Place URI> .

Naturally, what we want to do with VoID is assign place references to
a subset hierarchy such as

- Book Collection A
- Book A1
- Volume A1-1
- Chapter A1-1-1
- Chapter A1-1-2
etc.
- Volume A1-2
etc.
- Book A2

and so on.

But since the data is just a flat list of place references, I can't
partition based on class (void:classPartition) or predicate
(void:propertyPartition). In fact the only way to tell in which subset
a particular reference belongs is implict in the book fragment URI,
which may look something like this

<base URI>/text:1999.02.0025:book=3:poem=9

or

<base URI>?id=-C0BAAAAQAAJ&pg=PA247

with the different numbers and codes carrying information about
collection, book, sections, pages and so on.

Now my question: is there a way to partition based on properties of
the Subject (=Book fragment) URI (RegEx perhaps), or (which might even
be preferable) should we add "void:inDataset" statements to the data
itself to create explicit links between reference and subset?

Thanks in advance,
Rainer Simon

Rainer Simon

unread,
Jan 19, 2012, 6:45:04 AM1/19/12
to void-di...@googlegroups.com
Dear list,

I'm a member of the Pelagios Project (http://pelagios-project.blogspot.com) which aims to interlink resources in Ancient World Research by means of aligning place references to a common Gazetteer. We intend to use VoID to express metadata about our individual datasets. The most important aspect for us is being able to divide sets into a hierarchy of subsets. void:subset seems to be the right way to do this. However I'm having problems expressing how the subsets then map to the triples in our datasets.

To give a concrete example: One of our datasets is a list of place references in a digitized book collection. In slightly simplified terms, this dataset is one large RDF dump containing a flat list of statements expressing something like this:

<Book-Fragment-URI> <is a Reference To the Place> <Place URI> .

Naturally, what we want to create with VoID is subset hierarchies such as

- Book Collection A
  - Book A1
    - Volume A1-1
      - Chapter A1-1-1
      - Chapter A1-1-2
      etc.
    - Volume A1-2
      etc.
  - Book A2

and so on.
     
Since the data is just a list of place references, I can't partition based on class (void:classPartition) or predicate (void:propertyPartition). In fact the only way to tell in which subset a particular reference belongs is implict in the book fragment URI, which may look something like

<base URI>/text:1999.02.0025:book=3:poem=9

or

<base URI>?id=-C0BAAAAQAAJ&pg=PA247

With numbers and various attributes carrying information about collection, books, pages etc.

Now my question: is there a way to define the mapping between place references and subset based on properties of the subject (=book fragment) URI? (RegEx perhaps?) Or should we add the void:inDataset property to each place reference to make the link subset<->reference explicit?

Thanks in advance,
Rainer

Richard Cyganiak

unread,
Jan 23, 2012, 10:30:15 AM1/23/12
to void-di...@googlegroups.com
Hi Rainer,

Does void:uriRegexPattern solve your problem?
http://www.w3.org/TR/void/#pattern

Best,
Richard

Rainer Simon

unread,
Jan 23, 2012, 11:08:48 AM1/23/12
to void-discussion
Hi Richard,

Ah sorry, I totally overlook that! Yes - I assume this should be
sufficient for most of our cases!

But just in case: is the addition of a 'void:inDataset' relation also
a viable option? Or does this property have a different intended use,
and I'd be misusing it that way?

Thanks,
Rainer


On Jan 23, 4:30 pm, Richard Cyganiak <rich...@cyganiak.de> wrote:
> Hi Rainer,
>
> Does void:uriRegexPattern solve your problem?http://www.w3.org/TR/void/#pattern

Richard Cyganiak

unread,
Jan 23, 2012, 11:17:02 AM1/23/12
to void-di...@googlegroups.com

On 23 Jan 2012, at 16:08, Rainer Simon wrote:
> But just in case: is the addition of a 'void:inDataset' relation also
> a viable option? Or does this property have a different intended use,
> and I'd be misusing it that way?

It assumes the typical linked data deployment scenario where you have RDF documents (like http://dbpedia.org/data/Berlin) that describe entities of interest (like http://dbpedia.org/resource/Berlin - note the different URI!). void:inDataset would then be used to related the RDF *document* to the void:Dataset. It is *not* intended for relating the described *entities* to void:Datasets.

I can't tell exactly how this maps to your case, but it sounds like


<base URI>/text:1999.02.0025:book=3:poem=9

is a URI that identifies an entity, so void:inDataset doesn't seem quite right.

Best,
Richard

Rainer Simon

unread,
Jan 23, 2012, 11:25:28 AM1/23/12
to void-discussion
Ok thanks, that clarifies the use of void:inDataset! Yes, indeed, the
URI would usually identify an entity, not a document.

Regards,
Rainer


On Jan 23, 5:17 pm, Richard Cyganiak <rich...@cyganiak.de> wrote:
> On 23 Jan 2012, at 16:08, Rainer Simon wrote:
>
> > But just in case: is the addition of a 'void:inDataset' relation also
> > a viable option? Or does this property have a different intended use,
> > and I'd be misusing it that way?
>
> It assumes the typical linked data deployment scenario where you have RDF documents (likehttp://dbpedia.org/data/Berlin) that describe entities of interest (likehttp://dbpedia.org/resource/Berlin- note the different URI!). void:inDataset would then be used to related the RDF *document* to the void:Dataset. It is *not* intended for relating the described *entities* to void:Datasets.
Reply all
Reply to author
Forward
0 new messages