OBDA mapping with SPARQL source

Johan Van Noten

unread,

Jan 19, 2022, 12:18:39 PM1/19/22

to ontop4obda

Dear Ontoppers,

On several occasions, I have been interested in formulating my OBDA mappings with a SPARQL source.

Let me clarify.

Obviously, today the OBDA sources are expressed as a (potentially complex) SQL statement. This select statement performs a kind of "virtual curation" of the underlying data.
For instance: I am retrieving bread-orders from our ordering system. In our MES system, I know which orders were processed in which over of our bakery. I can make a query that relates every bread to the oven in which it has been baked and the time period during which it was in the oven.
This is a relatively complex query.
Sometimes an additional mapping specifies an additional property to be associated.
For instance: I want to associate every bread with the average temperature. I can find that in a specific table of temperature sensor information, which I also already mapped to my ontology (every oven has links to time+data values).

If I want to implement the latter OBDA mapping, I have to re-establish the relationship between bread, oven and period (== the source of the first query) and then join it to the correct temperatures (== new for this source query).

Instead, it would be much easier if I could re-use the existing knowledge:

select ?myBread (avg(?temperatur) as ?avgTemp) {

?myBread :bakedIn ?anOven :from timeStart :to timeEnd.

?anOven :timedTemperature ?timedData .

?timedData :value ?temperature :timestamp ?timestamp.

FILTER( ?timeStart < ?timestamp && ?timestamp < ?timeEnd)

}

This query is much more readable and maintainable than "redoing the original join".

Suppose that bread-orders are differently determined in the future, I only have to rewrite a single mapping, since my second mapping already uses the knowledge information instead of the base information.

Two questions:

Is that possible today? (I think I know that answer :-) )
Do you also feel this is useful? If not, why? If yes, is there any expectation about availability?

Thanks,

Johan

Benjamin Cogrel

unread,

Jan 20, 2022, 9:45:51 AM1/20/22

to ontop...@googlegroups.com, Johan Van Noten

Hi Johan,

Sorry, I didn't understand where the problem is.

What do you mean by "redoing the original join"? Is it currently done at the mapping level? Instead, would like you like the joins to be introduced by the SPARQL query?

Having the joins introduced by the SPARQL query is the normal setting for OBDA. SQL queries used in the mapping entries are expected most of the time to be simple, without any join. The main things that are shared between mapping entries are IRI templates.

The main motivation for introducing a join in a mapping entry is because the triple pattern is using columns coming from different tables. In some particular cases, the join is introduced for filtering out results. It is not clear to me what would motivate a join in your case.

Could you please also fix your SPARQL query? It seems that it is missing semi-colons, some question marks and a GROUP BY clause. Also, a few mapping entries would help.

Best,

Benjamin

--
Please follow our guidelines on how to report a bug https://ontop-vkg.org/community/contributing/bug-report
---
You received this message because you are subscribed to the Google Groups "ontop4obda" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontop4obda+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/e9559708-a81f-4aef-8b8a-75ec636bc375n%40googlegroups.com.

Johan Van Noten

unread,

Jan 20, 2022, 3:02:55 PM1/20/22

to ontop4obda

Hi Benjamin,

Sorry for my confusing and inaccurate explanation.
It seems I was too tired when I wrote it.
Allow me to clarify.

We are considering Ontop VKG as a virtual curation tool.
That means that we:

Create an ontology that sits close to the users’ problem domain.
In the (fake) example that I gave you, this means:
- The user is a baker
- He knows about baking bread in an oven
Live with a technical data structure, given by the nature of underlying tools.
Those are structured in a way that is irrelevant to the baker, but they are a fact of live.
You can think of:
- A database of customer orders, sitting in several different tables (customer_table, order_table, order_lines…)
- A database of order scheduling and exections, that maintains which order is processed in which oven
- A timeseries database of sensors spread throughout the bakery (e.g. oven temperatures…)
Use obda to map between both worlds.

Till here, this fully matches with the normal Ontop setting, right?

One of the mappings I need to make, links a Bread to the Oven in which it is baked.
Because of the complexity of the order scheduling and execution database, this requires quite a few joins on the source tables.
(Alternatively, I should map the different “to-be-joined” tables as concepts in the ontology, but that is exactly what I want to avoid: they are irrelevant to my user’s problem domain and would needlessly render the ontology more complex.)
I can also, relatively easy, map all oven temperatures to the respective ovens based on the timeseries data.

Till now, I have populated my world with triples that look like:

?myBread a :Bread
?myOven a :Oven
?myBread :bakedIn ?myOven
?td a :TemperatureDataPoint; :timestamp ?aTimestamp; :temperature ?aTemperature
?myOven :observedTemperature ?td
?session :BakingSession; :inOven ?myOven; :bakedBread ?myBread; :from ?startTime; :till ?endTime

Still no challenge.

Finally, from the baker’s point of view, he knows that the average temperature the Bread experienced during the BakingSession is very important.
He can retrieve that information through a SPARQL query on the available ontology, but it involves quite a bit of work for him:

select ?myBread (avg(?temperature) as ?avgTemperature) {
?session a :BakingSession;
:bakedIn ?anOven;
:bakedBread ?myBread;
:from ?startTime;
:to ?endTime .

?anOven a :Oven;
:observedTemperature ?timedData .

?timedData a :TemperatureDataPoint;
:timestamp ?timestamp;
:temperature ?temperature .

FILTER( (?startTime < ?timestamp) && (?timestamp < ?endTime) )
} group by ?myBread

Feasible, but quite too difficult for him, so we would prefer to provide him with a data property avgBakingTemperature on every Bread.
Essentially, this can be considered as a “derived property” since the information is already available, but not easily retrievable.

The current way to create an appropriate mapping for this new avgBakingTemperature would be to reconstruct a considerable amount of the joins involved in earlier mappings.
We have to again find and specify the relationship between a session and an oven, between the oven and temperatures, etc.
This involves the joins that we already did in previous mappings.

Obviously, it would be easier, if we could reuse the previously mapped knowledge for the specification of this derived property.
I know that a mapping source can only be SQL today, but imagine that we could also use SPARQL in the mapping source.
Then we could just write the above SPARQL query as mapping source and construct the avgBakingTemperature property instances out of this.
Target:
?myBread a :Bread; :avgBakingTemperature ?avgTemperature
Source:
The above SPARQL query

I now this target’s syntax is not “valid” with respect to the current rules, but you get my intention.

You are the OBDA expert, but conceptually it feels to match quite good with the scope of OBDA.
OBDA could rewrite the provided SPARQL query into a SQL query based on the other available mappings using the same mechanisms it uses for answering SPARQL queries in the first place.
(Well, I’m sure I oversimplify the complexity here…)

Alternatively, one could also put this problem in the shoes of a reasoner, but as far as I know, this is way beyond what they can do, no?

I hope I was able to make the question/suggestion somewhat clea(nr)er.
I'd be happy to learn your opinion on this.

Thanks for you time,
Johan

Op donderdag 20 januari 2022 om 15:45:51 UTC+1 schreef benjami...@bcgl.fr:

mart...@atomgraph.com

unread,

Jan 20, 2022, 4:13:34 PM1/20/22

to ontop4obda

Johan,

Aren't you describing SPARQL update? Something like:

INSERT {
?myBread :avgBakingTemperature ?avgTemperature .
}
WHERE
{ SELECT ?myBread (AVG(?temperature) AS ?avgTemperature)
WHERE
{ ?session a :BakingSession ;

:bakedIn ?anOven ;
:bakedBread ?myBread ;
:from ?startTime ;
:to ?endTime .
?anOven a :Oven ;
:observedTemperature ?timedData .
?timedData a :TemperatureDataPoint ;
:timestamp ?timestamp ;
:temperature ?temperature

FILTER ( ( ?startTime < ?timestamp ) && ( ?timestamp < ?endTime ) )
}
GROUP BY ?myBread
}

Ontop is read-only AFAIK, so this will not work directly on Ontop. But you could set up a parallel triplestore, e.g. Fuseki, and store the inferred/aggreggated triples in it. To retrieve combined data, you could federate Ontop and Fuseki using the SERVICE clause.

Johan Van Noten

unread,

Jan 20, 2022, 4:46:34 PM1/20/22

to ontop4obda

Hi,

Thanks for your suggestion.

That is certainly yet another approach, but this is what I would consider to be "hard curation": it would materialize all new tuples while our goal is explicitly not to duplicate data.

While the "hard" approach would be reasonable for low-volume data, there are quite some challenges in our cases:

* Some data is seldomly requested, but high volume if generated upfront for all potential cases.

* Some data computations are (very) expensive, so you don't want to perform too many of them, unless really required.

* Our data sets (like many) continuously evolve, so we should implement an incremental update.

Do my remarks make sense or do you see it differently?

Johan

Op donderdag 20 januari 2022 om 22:13:34 UTC+1 schreef mart...@atomgraph.com:

Martynas Jusevicius

unread,

Jan 20, 2022, 5:07:33 PM1/20/22

to Johan Van Noten, ontop4obda

I’m not an ontop developer, but what you’re suggesting goes beyond standard SPARQL.

The feasible approach you described makes sense most to me, but I understand there can be performance issues.

To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/ff5c91c9-1af5-4ed7-ae88-9ff392c75373n%40googlegroups.com.

Benjamin Cogrel

unread,

Jan 21, 2022, 8:24:35 AM1/21/22

to ontop4obda

Hi,

Thank you Johan for this detailed explanation, the problem is now clear.

It is a perfectly legitimate case that goes well with the OBDA/VKG story. This reminds me the role of data marts in data warehouse approaches. It is also related to several discussions like https://github.com/ontop/ontop/issues/470

At first glance, from a technical point of view, I foresee 2 possible solutions:

1. Create a new kind of Ontop views (https://ontop-vkg.org/guide/advanced/views) defined from SPARQL SELECT queries. These views would then be directly used in the mapping file. This goes in the direction of what Johan proposed.

2. Augment internally the mapping using a set of INSERT SPARQL queries. These queries would be run on the mapping before augmentation, avoiding loop problems. This is more in the direction of Martynas's proposal.

At the moment, I would be more in favor of solution #2, as it looks simpler to implement.

Best,

Benjamin

To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/CAGpOn6ZXz7z82KtmkLban3zeh1ZgupKbGHjOFi8KFL08Xc_FEQ%40mail.gmail.com.

Benjamin Cogrel

unread,

Dec 9, 2022, 6:12:43 AM12/9/22

to ontop4obda

Hi all,

This feature has been implemented and will be soon released with Ontop 5.0.0: https://github.com/ontop/ontop/pull/576 .