LEFT OUTER JOINs for OPTIONAL

58 views
Skip to first unread message

Michał Pietrusiński

unread,
Jun 16, 2023, 9:00:52 AM6/16/23
to ontop4obda
Hi,

I am quite new to ontop and I am struggling with a performance problem.
I am trying to map just one Oracle table.
The table has a primary key, which is basically not used nor mapped to RDF. There is an additional column, ELNUM, which is unique, and is used to identify the row in the table (it plays a role of a business id). I include this column in the constraint file.

For SPARQL queries like
SELECT * WHERE {
  VALUES ?s { <http://.../12345>}
  ?s :pred1 ?val1.
  ?s :pred2 ?val2.
...
}
I get SQL without any joins.

However when I use OPTIONALs, like here:
SELECT * WHERE {
   VALUES ?s { <http://.../12345>}
  OPTIONAL {?s :pred1 ?val1. }
  OPTIONAL { ?s :pred2 ?val2. }   
..
}
I get SQL query with lots of LEFT OUTER JOINs. Can this be somehow optimized?

I do not pass ontology to ontop, and the table I am reading is actually a view in Oracle - the constrains are not on the view but on the tables the view is using, so ontop probably cannot see them, this is why I had to include the ELNUM column in the constraint file.


 

Benjamin Cogrel

unread,
Jun 16, 2023, 9:22:48 AM6/16/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

Ontop embeds optimizations for left-joins (and a few more are coming soon with 5.1.0), which primarily make use of unique constraints and foreign keys.

What about the following query?

SELECT * WHERE {
   VALUES ?s { <http://.../12345>}
  ?s :pred1 ?val1.
  OPTIONAL { ?s :pred2 ?val2. }   
..
}

Does it simplify itself (no left/inner join)?

If you try 5.1.0-SNAPSHOT on DockerHub or on Maven, the query you showed should normally contain only one left-join.
One left-join is needed because if 12345 does not appear in your table, you should still have a result, even if there is no value for ?val1 and ?val2 .

If you can reformulate your query as 

SELECT * WHERE {
   VALUES ?s { <http://.../12345>}
  ?s a :C1 .
  OPTIONAL {?s :pred1 ?val1. }
  OPTIONAL { ?s :pred2 ?val2. }   
..
}

With the class C1 closely related to your table, then all (left/inner) joins are likely to go away.

Best,
Benjamin



--
Please follow our guidelines on how to report a bug https://ontop-vkg.org/community/contributing/bug-report
---
You received this message because you are subscribed to the Google Groups "ontop4obda" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontop4obda+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/47289e63-39ae-4b51-9fdf-5a6a25fb7c30n%40googlegroups.com.

Michał Pietrusiński

unread,
Jun 16, 2023, 12:12:29 PM6/16/23
to ontop4obda
Hi Benjamin,

Thanks so much for the very quick response. I have tested your suggestions and they work very good also on version 5.0.2! The query is executed now in 100ms, compared to 4 seconds before.

I still however see similar problem, when I want to read from several IRIs, like below:
SELECT * WHERE {
   VALUES ?s { <http://.../12345>  <http://.../11111> <http://.../11112>}
  ?s :pred1 ?val1.
  OPTIONAL { ?s :pred2 ?val2. }   
..
}
For six IRIs it takes 25 seconds, and the query uses again LEFT OUTER JOINS... I checked this also on the 5.1.0-SNAPHOT version. Is it possible to optimize this somehow?

Best,
Michal

Benjamin Cogrel

unread,
Jun 19, 2023, 2:47:09 AM6/19/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

Could you please enable debugging (e.g. ONTOP_DEBUG = "true" on Docker) and send us the log for this query?
If you prefer to keep your log private, you can send it to me directly.

If you use FILTER(?s IN (...)) instead of the VALUES, does it change anything?

Best,
Benjamin

Michał Pietrusiński

unread,
Jun 19, 2023, 10:39:39 AM6/19/23
to ontop4obda
Hi Benjamin,

I will send the logs to you directly. Regarding the FILTER(?s IN..) it works perfectly - no joins and ORs are used to match the list of ids. Unfortunately I do not have much control on the query as it is automatically generated from GraphQL.

Best,
Michal

Benjamin Cogrel

unread,
Jun 21, 2023, 9:27:28 AM6/21/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

Thanks for having shared the logs. The issue comes from the fact that IRI strings stored in VALUES nodes are not being decomposed, causing some joins to be done on concatenated IRI strings instead of the primary keys of the underlying tables. Because of that, many optimizations are not applied.

I opened a PR to address that issue:  https://github.com/ontop/ontop/pull/680 .
It is now merged into version5 and available as a snapshot (5.1.0-SNAPSHOT) on DockerHub and Maven.

Could you please give it a try?

Best,
Benjamin


Michał Pietrusiński

unread,
Jun 22, 2023, 7:01:26 AM6/22/23
to ontop4obda
Hi Benjamin,

Thanks so much for providing the fix so quickly. I tested two types of queries:
 
1. with all patterns optional
VALUES  ?s  { many IRIs..}
OPTIONAL {?s prop1 ?p1}
OPTIONAL { ?s prop2 ?p2}
OPTIONAL { ?s prop3 ?p3}
OPTIONAL { ?s prop4 ?p4}
..

2. with a non optional pattern
VALUES  ?s  { many IRIs..}
?s prop1 ?p1 .
OPTIONAL { ?s prop2 ?p2}
OPTIONAL { ?s prop3 ?p3}
OPTIONAL { ?s prop4 ?p4}


The type 1. is still not optimized (many left outer joins)
The type 2. is optimized and fast.

When just one IRI is passed in the VALUES, both types are optimized.

Best,
Michal

Benjamin Cogrel

unread,
Jun 23, 2023, 6:18:24 AM6/23/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

Thanks for the feedback!

Indeed when only one IRI is passed in the VALUES, Ontop automatically replaces the VALUES node by something simpler and propagates the information
in the query tree.

I am working on improving the left-join merging optimization to handle the query type #1. After that, there should be only one left-join left.

Still, this type of query would need a novel form of VALUES node decomposition to avoid joining over IRI strings. 
Before planning to implement it, I would like to better understand how common this pattern is and in which contexts it is used.

In principle the values of ?s are not required to be present in the RDF graph. Does it happen sometimes?
Are there opportunities to attach the values of ?s to a class?

Best,
Benjamin


Benjamin Cogrel

unread,
Jun 27, 2023, 2:13:46 AM6/27/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

I have improved the left-join merging optimization to handle query type #1 (https://github.com/ontop/ontop/pull/686).

It should help, but VALUES decomposition remains very likely needed to take advantage of indexes for these lookup queries.

Best,
Benjamin

Michał Pietrusiński

unread,
Jun 28, 2023, 9:47:57 AM6/28/23
to ontop4obda
Hi Benjamin,

I have just tested the query type #1 with the latest 5.1.0-SNAPSHOT docker image, but I still can see lots of LEFT OUTER JOINs and REPLACEs and it takes about 20 seconds.

When it comes to your question how often we can have situation that the IRIs present in the VALUES are not present in the graph it depends on the environment we will use. Our use case with Ontop is that we have our primary data in a triplestore and we use Ontop to enrich it with data from another system powered by Oracle. For the production data we will have almost 100% match. In case of development and testing environments we are dealing with a lot of mock data that do not have its counterparts in the Oracle database.

Thanks a lot,
Michal

Benjamin Cogrel

unread,
Jun 28, 2023, 10:48:30 AM6/28/23
to Michał Pietrusiński, ontop4obda
Hi Michał,

Thank you very much for your detailed reply. I now better understand the context.

May I ask you for the logs?

Best,
Benjamin

Benjamin Cogrel

unread,
Jun 30, 2023, 12:32:23 PM6/30/23
to ontop4obda
Hi all,

Just to inform you that the issue with the last optimization has been resolved.

Thanks again Michal for providing the logs and testing the fix!

Best,
Benjamin

Michał Pietrusiński

unread,
Jun 30, 2023, 3:47:29 PM6/30/23
to ontop4obda
Thanks so much Benjamin for introducing the optimizations. 

Best,
Michal

Thomas Taylor

unread,
Apr 12, 2024, 11:15:40 PM4/12/24
to ontop4obda
Hi All,

I seem to have an issue similar to this on Ontop 5.2.0-SNAPSHOT (pulled 3/14/2024).

I have a simple SQL Server view ("vw") with 115 columns ("Col1"..."Col115")... approximately 100,000 rows.

I defined an R2RML mapping to define a composite key template (.../col1={col1},col2={col2}) - both integers. The other columns are varchar(255). I provide Ontop both the R2RML and OWL ontology for the table.

When I execute a SPARQL query to select all values from the view (no optionals), the resulting SQL query seems to have a join for every column (~115) and takes 5-10 minutes to execute.

Can you point me to any tips or hints on how I could get Ontop to optimize this query better?

Thanks for the help.

Thomas Taylor

SPARQL:
SELECT (?vw_col1 as ?Col1) (?vw_col2 as ?Col2) ... (?vw_col115 as ?Col115)
WHERE ( ?cls_vw a <urn://rdb/vw> ;
  <urn://rdb/vw#Col1> ?vw_col1 ;
  <urn://rdb/vw#Col2> ?vw_col2 ;
  ...
  <urn://rdb/vw#Col115> ?vw_col115;

SQL:
SELECT  DISTINCT 
  v2."Col3" AS "Col31m213",
  v3."Col4" AS "Col41m245",
  v4."Col5" AS "Col51m169",
  ...
  v115."Col115" AS "Col1151m202"
FROM
  "vw" v1,
  "vw" v2,
   ...
   "vw" v115
WHERE
  v2."Col3" IS NOT NULL 
  AND v3."Col4" IS NOT NULL
 ...
  AND v1."Col1" = v2."Col1"  AND v1."Col2" = v2."Col2"
  AND v1."Col1" = v3."Col1"  AND v1."Col2" = v3."Col2"
  ...
  AND v1."Col1" = v115."Col1"  AND v115."Col2" = v2."Col2"
Reply all
Reply to author
Forward
0 new messages