Issue 102 in void-impl: Describing a state of a dataset

Skip to first unread message

Apr 2, 2011, 7:31:19 AM4/2/11
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium Product-vocab

New issue 102 by Describing a state of a dataset

AFAIK, currently it is not possible to describe the (deployment/publishing)
state of a dataset. This issue was raised in a thread on SemanticOverflow
[1]. I guess, VoID is a quite good place for offering terms to enable such
a knowledge representation, e.g., starting with a simple property such as

Apr 2, 2011, 7:35:19 AM4/2/11

Comment #1 on issue 102 by Describing a state of a

PS: it seems that I cannot change the properties of an issue. So don't
blame me for that ;)

Apr 3, 2011, 4:41:45 AM4/3/11

Comment #2 on issue 102 by Describing a state of a

What would possible values of such a property be?

Apr 3, 2011, 5:01:48 AM4/3/11

Comment #3 on issue 102 by Describing a state of a

A non-answer might be to look at the "state" property on a CKAN dataset,
which can take on the values "active", "deleted" and "pending". This is
obviously underspecified and what is really going on is trying to describe
the workflow of dataset (or rather metadata - important distinction)
curation. How deeply is it practical to model this? How is the fact that
different sources will have different workflows handled?

Not entirely unrelated, I've been searching for a suitable predicate to
hold the "revision_id" which is, for CKAN a uuid such that the tuple
("package_id", "revision_id") uniquely identifies a particular version of a
metadata record, but I can easily imagine such a construct for talking
about the data themselves.

Is this more in the realm of DCat than voiD I wonder...

Apr 3, 2011, 8:44:25 AM4/3/11

Comment #4 on issue 102 by Describing a state of a

@cygri: from the question that was raised on SemanticOverflow something
like "alpha", "beta", ...

Apr 3, 2011, 3:32:38 PM4/3/11

Comment #5 on issue 102 by Describing a state of a

@zazi: No, alpha/beta were proposed for software, not for datasets.

I'd like to see:

a) an actual proposal what the values of that property would be
b) a use case or two
c) some examples of real datasets where the publisher actually announces
some sort of status of the dataset; it doesn't seem to be so

Apr 3, 2011, 4:21:57 PM4/3/11

Comment #6 on issue 102 by Describing a state of a

@cygri: yes, you are right, "draft" and "published" were the proposed on
for dataset. Generally, it might be the best to contact ngn
( re. this issue, since she
requested it. I just thought that is might probably be useful to raise this
issue here, too. However, I only delegated it to this place ;)

Apr 5, 2011, 4:32:04 AM4/5/11

Comment #7 on issue 102 by Describing a state of a

As I have asked the question on , I'd like to explain
the context.

We are developing a framework of REST services in particular domain. Every
REST resource is an RDF resource as well, and has an RDF representation
(among others) , according to certain ontology. One of the type of
resources is a dataset (for the record, a dataset of chemical compounds and
associated experimental or calculated properties).

As datasets and dataset content (the triples, describing the dataset) are
dynamically generated (created, uploaded or modified) by clients (human
users, client applications or other services), we would like to have means
to assign a status to the dataset (and to other resources as well, as for
example new predictive models). As an example, the resource in question
might be just a test dataset, uploaded by an user to see how the system
works, or one, which underwent series of processing and there is a
consensus of its quality.

Thus, what I am looking for
1) is not a status of a fixed CKAN dataset, that changes rarely (e.g. with
next release).

2) is not a status of "RDF class or property" as in (although its simplicity is
appealing), we would like to label an RDF individual instead of a class.

3) is not exactly a status of an ontology. The classes and properties of
the domain ontology are defined and versioned, as part of the definition of
the web services framework, but instances are dynamically generated and

4) Publishing Status Ontology looks like the closest match , but PSO
assumes the publishing work is a document, not arbitrary RDF individual
(and a set of associated properties).

I agree this is not a widespread use case, mostly because there are not
many frameworks like ours, which do not follow the mainstream practice of
centralized triple storages with relatively stable content, where the
issue could be handled via labeling the entire release for example.

The RDF being "one big graph" becomes an obstacle, if one would like to
version / assign status to a dynamically generated subset of individuals. I
am really interested on opinions what could be the correct solution.

Apr 5, 2011, 6:16:33 AM4/5/11

Comment #8 on issue 102 by Describing a state of a

As far as I understand PSO, it is not only restricted to documents,
see "PSO, the Publishing Status Ontology, is an ontology written in OWL 2
DL for characterizing the publication status of a document or other
publication entity ..." [1], i.e., every entity (resource) that is
published (in a publishing process) is a publication entity, or?

[1] http:/

Oct 13, 2012, 5:10:26 AM10/13/12

Comment #9 on issue 102 by Describing a state of a

I have a use case where I want to say that a previously existing dataset
has been “decommissioned”. The issue discussed here could offer a solution
for this use case.

Reply all
Reply to author
0 new messages