RDF* and SPARQL* study

3 views

Skip to first unread message

Paul Warren

unread,

Jun 24, 2020, 7:14:20 AM6/24/20

to Semantic Science

I would greatly appreciate your involvement in a study which I believe will be beneficial to our community, in particular supporting the further development of SPARQL and related languages.

The study investigates how people create models in RDF* and queries in SPARQL*. As some of you will be aware, RDF* and SPARQL* are extension to RDF and SPARQL designed by Olaf Hartig (https://blog.liu.se/olafhartig) to enable reification, and the querying of reified statements.

The results of the study will be published in the research literature; all data will be anonymised.

The study consists of 15 questions relating to RDF* and SPARQL*, and will take between 30 minutes and one hour. Some of the questions depend purely on the features of RDF and SPARQL, as defined by the W3C. Other questions make use of the extended features in RDF* and SPARQL*. Whilst many of you will, I am sure, be very familiar with RDF and SPARQL, I provide an information 'handout' which describes everything you need to know to complete the study. In particular, the last page of the handout lists everything you need in compact form.

The study questionnaire is available at https://spice.kmi.open.ac.uk/pwsparql/sparql. The handout is appended to this email.

The study is available for approximately two weeks, i.e. until Wednesdy 8th July.

The study has been reviewed by, and received a favourable opinion from, The Open University Human Research Ethics Committee, reference HREC/3568.

Many thanks.

Paul Warren

Visiting Research Fellow

Knowledge Media Institute, The Open University, U.K.

http://kmi.open.ac.uk/people/member/paul-warren

skype: paul.w.warren1

07379 114331

---------------------------------------------------------

RDF* and SPARQL* Usability Study 08/06/20

Paul Warren, paul....@open.ac.uk; Paul Mulholland, paul.mu...@open.ac.uk

N.B. this handout is to be read prior to undertaking the study, and to be retained for reference

at any time during the study. The final page contains a summary of all you need to know.

1 Introduction

The purpose of this study is to understand the difficulties people experience using a data

modelling format known as RDF* and the corresponding query language, SPARQL*. These

are extensions of the W3C standards, RDF and SPARQL . No prior knowledge or

experience is required for the study; all the necessary features are explained in this handout.

The object is to understand how people manipulate the data model and language constructs

mentally, so please do not use pen and paper.

The structure of this handout is as follows. Section 2 describes RDF, and Section 3 describes

how RDF has been extended to create RDF*. Section 4 then provides an overview of

SPARQL. Sections 5 and 6 describe some specific features of SPARQL which are used in

this study. Section 7 describes SPARQL*, the extension of SPARQL to query RDF*.

Section 8 gives an overview of how the study is organised. Finally, there is a one-page

summary which provides all the necessary information in compact form.

2 RDF

This section describes a simplified subset of the RDF data model, as used in the study

questions. The basic component of the data model is a triple, consisting of subject, predicate

and object. The subject and predicate will be resource identifiers, composed of a colon

followed by any combination of letters and digits, e.g. :Stephen, :marriedTo, :1801,

:year1801. In all subsequent discussion these will be referred to simply as 'identifiers'. The

object may also be an identifier. An example of such a triple is:

:Stephen :marriedTo :Mary .

As an alternative, the object may be a number or a character string. For the purposes of this

study, the former will be written as an integer in the normal way; the latter will be enclosed in

single quotes, e.g.

:Stephen :hasAge 32 .

:Mary :role 'lawyer' .

Note that in the example given in the first paragraph of this section, the predicate

(:marriedTo) connects two identifiers (:Stephen and :Mary). On the other hand, in the

examples in the second paragraph, the predicates are assigning attributes (age and job role) to

the two identifiers; the attributes themselves are either numbers or character strings. Note

that numbers can also be used to represent dates (e.g. 1801). Note also the difference

between 1801, which is a number; and :1801, which is an identifier.

A database consists of a set of such triples. In our study, triples are each written on a separate

line, with a period (full stop) at the end of each line. The same identifier can occur in more

than one triple, e.g. a simple database might consist of the following triples:

:Stephen :marriedTo :Mary .

:Stephen :hasAge 32 .

:Mary :role 'lawyer' .

:John :brotherOf :Mary .

:John :role 'accountant' .

:Mary :worksFor :BigCo .

:John :worksFor :BigCo .

A database such as the one above can be visualized as a graphical structure. The identifiers

(:Stephen, :Mary, :John, :BigCo) can be regarded as nodes. The predicates :marriedTo and

:brotherOf can be regarded as edges joining nodes. The predicates :hasAge and :role

associate attributes with nodes.

3 RDF*

RDF* is an extension to RDF which permits metadata to be associated with a triple. For

example, we may wish to say that Mary has worked for BigCo from 2000. We do this by

writing the triple associating Mary and BigCo, enclosed within double angled brackets,

within another triple:

<<:Mary :worksFor :BigCo>> :from 2000 .

In this example, the object of the inner triple is an identifier whereas the object of the outer

triple is a number. However, either object could have been an identifier, number or character

string. Another example would be:

<<:Mary :worksAs 'lawyer'>> :for :BigCo .

In principle, triples can be nested to any level of depth. In this study, triples are nested to a

maximum of two levels, e.g.:

<<<<:Mary :worksFor :BigCo>> :role 'lawyer'>> :from 2000 .

This is an example of the use of all three types of object, i.e. identifier (:Mary), character

string ('lawyer'), and number (here representing a date, 2000). The example is indicating

that Mary has had the role of a lawyer with BigCo from 2000.

4 Overview of SPARQL

SPARQL is a language for querying RDF databases. Each query begins with the SELECT

keyword, followed by the list of variables which we wish to output, e.g.

SELECT ?person1 ?person2

This is followed by a WHERE clause, i.e. the WHERE keyword and a set of 'triple patterns'

enclosed in curly brackets. In the simplest case, there is only one triple pattern e.g.

WHERE { ?person1 :brotherOf ?person 2}

In the database in section 2, the solution to this query will be:

?person1 = :John; ?person2 = :Mary

Note that, as with triples in the database, a triple pattern in a query consists of a subject,

predicate and object. These may be variables, e.g. ?person, or identifiers in the database, e.g.

:Stephen, :brotherOf. An object may also be a number or character string, e.g.

SELECT ?person

WHERE {?person :hasAge 32}

The query will output ?person = :Stephen.

The solution to a variable can also represent a number or character string, e.g.

SELECT ?age

WHERE {:Stephen :hasAge ?age}

Here, the solution is:

?age = 32

Sometimes we have two or more triples which share variables, and we need all triples to be

valid. Returning to our example database, assume we want to know who is Stephen's wife

and what company she works for:

SELECT ?wife ?company

WHERE { :Stephen :marriedTo ?wife . ?wife :worksFor ?company }

The dot between the triple patterns indicates that both must be satisfied; so that the same

identifier (represented by ?wife) has to be the object of both the triples. The solution is:

?wife = :Mary, ?company = :BigCo

We could also write the two triple patterns the other way around:

WHERE {?wife :worksFor ?company . :Stephen :marriedTo ?wife}

The variables written after the SELECT keyword must appear in the WHERE clause.

However, there may be variables in the WHERE clause which do not appear after the

SELECT keyword. These are variables which are needed to create the query, but which we

do not wish to output. In the previous example, we may not be interested in Stephen's wife,

only her company. We can write:

SELECT ?company

WHERE {:Stephen :marriedTo ?wife . ?wife :worksFor ?company }

This query will output ?company = :BigCo.

There may be more than one solution for a variable, or for a combination of variables. For

example, we may wish to know what information the database contains about people and

their roles:

SELECT ?person ?role

WHERE {?person :role ?role}

Using the database described in Section 2, there are two sets of solutions:

?person = :Mary; ?role = 'lawyer'

?person = :John; ?role = 'accountant'

5 More about predicates in SPARQL

This section is about three operators which can be used with predicates in SPARQL. Firstly,

we can use the concatenation operator, represented by the slash symbol (/) to concatenate

predicates. In the previous section we showed a query which outputted the company for

which Stephen's wife works. It used the variable ?wife, which we did not wish to output and

was not in the SELECT clause. We can avoid using this variable, and create a more compact

query by using the concatenation operator:

SELECT ?company

WHERE {:Stephen :marriedTo / :worksFor ?company}

Here :Stephen is linked to ?company by a chain of predicates consisting of :marriedTo and

:worksFor.

Another operator we use is the reverse operator, represented by a hat symbol (^). This

reverses the directionality of a predicate. For example, we may wish to know who John is the

brother of. We can do this with the following query:

SELECT ?person

WHERE {:John :brotherOf ?person}

However, we can achieve the same thing with the query:

SELECT ?person

WHERE {?person ^:brotherOf :John}

Here, the directionality of :brotherOf is reversed. Of course, in this example there is no need

to introduce the extra complication of the reverse operator; one might as well use the first

query. However, the ability to reverse the directionality of a predicate comes into its own

when we use the reverse operator within a chain of concatenated predicates. For example, we

may wish to know who is the brother-in-law of Stephen. We can do this without using the

reverse or concatenation operators:

SELECT ?brotherInLaw

WHERE {:Stephen :marriedTo ?person . ?brotherInLaw :brotherOf ?person}

Here, there are two triple patterns connected by a dot, so they must both hold for a valid

solution. However, we can avoid using the variable ?person and create a more compact

query, by using the reverse and concatenation operators together:

SELECT ?brotherInLaw

WHERE {:Stephen :marriedTo / ^:brotherOf ?brotherInLaw}

This query would return ?brotherInLaw = :John.

A final feature relating to predicates, which we use in our study, is the ability to repeat a

given predicate any number of times. We do this by suffixing a predicate with a plus symbol

(+). We can illustrate this with a new database illustrating the relationship between animal

groups, with just two triples:

:Monkey :subGroupOf :Primate .

:Primate :subGroupOf :Mammal .

We wish to know all the subgroups, and sub-subgroups of :Mammal. We can write the

query:

SELECT ?subgroup

WHERE {?subgroup :subGroupOf+ :Mammal}

The plus symbol means that the predicate can appear once, or be concatenated with itself an

unlimited number of times, i.e.

:subGroupOf

subGroupOf / subGroupOf

subGroupOf / subGroupOf / subGroupOf

...

Hence, the query will output:

?subgroup = :Primate; ?subgroup = :Monkey

Here, :Primate is a subgroup of :Mammal, and :Monkey is a subgroup of :Primate, and hence

a sub-subgroup of :Mammal. If the database contained a subgroup of :Monkey, that would

also be returned, and so on indefinitely.

6 Three functions: STR(), CONTAINS(), and FILTER()

We have already made a distinction between an identifier, e.g. :London and a character

string, e.g. ':London'. We may wish, however, to make comparisons between identifiers and

character strings. This requires the use of three functions.

The first of these functions is STR(), which we use to extract the character string of which the

identifier is composed. Thus STR(:London) returns the character string ':London'.

We may also wish to check whether a particular character string is contained within another

character string. We can do this using the CONTAINS() function. This function has two

string arguments, and returns a logical true if the second argument is contained within the

first, and a logical false otherwise. For example, the following will return a logical true:

CONTAINS(':London', 'London')

The following will return logical false:

CONTAINS(':London', 'Edinburgh')

Note that one or both of the arguments could be variables (i.e. beginning with a question

mark) representing character strings.

The function FILTER takes one argument which is either true or false. It has the effect of

removing any solution for which the argument evaluates to false.

As an example of how these three functions can work together, consider the database:

:Joe :livesIn 'London' .

:Fred :livesIn 'Edinburgh' .

Then consider the following query:

SELECT ?person

WHERE {?person :livesIn ?city . FILTER(CONTAINS(STR(:London), ?city))}

The first triple pattern in the WHERE clause permits two possibilities: ?person = :Joe; and

?person = :Fred. However, of the two possibilities for ?city, only the string 'London' is

contained within the string ':London'. Hence, only the solution ?person = :Joe is valid,

because only Joe lives in London.

In summary, with the use of these three functions, we can compare identifiers (e.g. :London)

to character strings (e.g. 'London').

7 SPARQL*

In section 2 we introduced RDF*, an extension of RDF which permits triples to be embedded

within other triples. In order to fully query RDF*, we need to analogously extend SPARQL

to form SPARQL*.

Imagine we have a database containing:

<<:Mary :worksFor :BigCo>> :role 'lawyer' .

We may wish simply to know for whom Mary works. This we can do with a SPARQL

query:

SELECT ?company

WHERE {:Mary :worksFor ?company}

This will give the solution ?company = :BigCo. This is because any embedded triple in

RDF* is regarded as valid. Put another way, the triple above can be regarded as two triples:

<<:Mary :worksFor :BigCo>> :role 'lawyer' .

<<:Mary :worksFor :BigCo>> .

In the same way as RDF* can embed triples within triples, using double angled brackets, so

SPARQL* can embed triple patterns within triple patterns using double angled brackets. For

example, if we wish to know both the company Mary works for and her role in that company,

we need the SPARQL* query:

SELECT ?company ?role

WHERE {<<:Mary :worksFor ?company>> :role ?role}

Just as RDF* can embed triples to any level of nesting, so SPARQL* can embed triple

patterns to any level of nesting. Imagine we have a database containing:

<<<<:Mary :worksFor :BigCo>> :role 'lawyer'>> :from 2000 .

<<<<:Mary :worksFor :BigCo>> :role 'manager'>> :from 2010 .

We may wish to know for whom Mary works, what roles she has had, and when she began

each role. An appropriate query would be:

SELECT ?company ?role ?startdate

WHERE {<<<<:Mary :worksFor ?company>> :role ?role>> :from ?startdate}

This would output two solutions:

?company =:BigCo, ?role = 'lawyer', ?startdate = 2000

?company =:BigCo, ?role = 'manager', ?startdate = 2010

Note that the order in which the predicates appear in the WHERE clause must reflect the

order in the predicates in the RDF*. The following query would return no solutions:

SELECT ?company ?role ?startdate

WHERE {<<<<:Mary :worksFor ?company>> :from ?startdate>> :role ?role}

8 Overview of study

The study consists of 15 questions, the first two of which are practice questions to get you

used to the format. Your responses to these will not be used in the subsequent analysis.

There are two types of questions: modelling questions and querying questions.

In the modelling questions you are presented with some information in English; one or more

queries, in English, that one might want to ask; and some alternative models in RDF*. You

are asked to indicate which of these models are correct. Here, 'correct' means that the RDF*

model accurately represents the information and that SPARQL* queries can be constructed to

represent the English queries. You are also asked to rank the models. You can use whatever

criteria you wish to perform this ranking, and it is possible to give two or more models the

same rank. Optionally, you can add free format textual comment for each model.

In the querying questions you are provided with a model in RDF*; a query in English; and

some alternative queries in SPARQL*. You are asked to indicate which of the SPARQL*

queries correctly represent the English query, when applied to the RDF* model.

For each question, there may be any number of correct responses. None may be correct; or

they may all be correct; or some may be correct and some may be incorrect. For each

proposed response, you will need to click to indicate 'correct' or 'incorrect'. You can change

your response, by clicking on the alternative button, at any time until you click on 'Submit

and continue' at the bottom right. When you do click on 'Submit and continue', you move

on to the next page. It is not possible to return to a previous page. Please do not attempt to

do so with the browser 'back' button. Finally, please note that this project has been reviewed

by, and received a favourable opinion from, The Open University Human Research Ethics

Committee, reference HREC/3568.

THE FOLLOWING PAGE SUMMARISES EVERYTHING YOU NEED TO KNOW

Summary

Conventions used for writing RDF* and SPARQL* in this study

* Identifiers are composed of a colon, followed by any combination of letters and digits,

e.g. :John, :brotherOf, :1801, :year1801.

* Character strings are enclosed in quotes, e.g. 'lawyer'.

* Numbers, including years, are written in the normal way as integers, e.g. 2010.

* SPARQL* keywords and functions are written in capitals, e.g. SELECT and STR().

* Variables begin with a question mark, e.g. ?name. Note that variables can represent

identifiers, character strings or numbers.

RDF triples

* RDF triples consist of subject, predicate and object. Subjects and predicates are

identifiers. In this study, objects are identifiers, character strings or integers.

Overview of SPARQL

* SELECT clause indicates which variables are to be output, e.g. SELECT ?person

* WHERE clause indicates conditions to be satisfied, e.g.

WHERE {:Mary :worksFor ?company}

* A dot between triple patterns in a WHERE clause means that both must be valid, e.g.

WHERE { :Stephen :marriedTo ?wife . ?brotherInLaw :brotherOf ?wife }

Predicate operations

* / concatenates predicates in a WHERE clause, e.g.

WHERE {:Stephen :marriedTo / :worksFor ?company}

* ^ reverses the directionality of a predicate, e.g.

?person ^:brotherOf :John is equivalent to :John :brotherOf ?person

* + repeats a predicate an unlimited number of times

?subgroup :subGroupOf+ :Mammal means that ?subgroup can be a subgroup of

:Mammal, or a sub-subgroup, or a sub-sub-subgroup etc.

SPARQL functions

* STR() converts an identifier into a character string, e.g.

STR(:London) returns the string ':London'

* CONTAINS returns a logical true if the first argument contains the second, false

otherwise.

* FILTER() means that the query will only output values which make its argument true, e.g.

FILTER(CONTAINS(STR(:London), ?city)) means that ?city must represent a

character string contained within the string ':London'.

Embedded triples and triple patterns

* Embedded triples in RDF* are placed within double angled brackets, e.g.

<<:Mary :worksFor :BigCo>> :role 'lawyer' .

N.B. this is considered as two triples, i.e. the inner triple also holds.

* Triples can be embedded to any level of nesting, e.g.

<<<<:Mary :worksFor :BigCo>> :role 'lawyer'>> :from 2000 .

* Embedded triple patterns in SPARQL* are also placed within double angled brackets, e.g.

WHERE {<<:Mary :worksFor ?company>> :role ?role}

* SPARQL triple patterns can also be embedded to any level of nesting, e.g.

WHERE {<<<<:Mary :worksFor ?company>> :role ?role>> :from ?startdate}

Hartig, Olaf, and Bryan Thompson. 'Foundations of an Alternative Approach to Reification in RDF'. ArXiv

Preprint ArXiv:1406.3399, 2014.

Reply all

Reply to author

Forward

0 new messages