Problem: SPARQL - Slow Queries when sharing properties with a same name/URI

16 views
Skip to first unread message

Tulio Vidal

unread,
Dec 3, 2021, 2:57:44 PM12/3/21
to ontop4obda
Hello,

I'm having a problem where I've tried countless possibilities and I would like help.


I paste the content in this email.

--------


SPARQL query takes a long time to return result when it uses properties with name/URI shared between 2+ mappings.

Steps to Reproduce
  1. Considering a Big Data Source with 2 tables RFB_EMPRESA (Organizations) [ 46 906 616 Tuples ] and RFB_ESTABELECIMENTO (Establishments) [ 49 419 313 Tuples ]
  2. The problem consists in run a SPARQL query that contains a property name / UR I(sefazma:cnpj) -> prefix sefazma: http://www.sefaz.ma.gov.br/ontology/> shared in 2 or + mappings present in mappings (map:Organization) and (map:Establishment).
  3. ontop endpoint -p config.properties -m mappings.ttl (The ontology parameter is not being passed - only the mappings)
  4. SPARQL:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX sefazma: <http://www.sefaz.ma.gov.br/ontology/> SELECT * WHERE { ?org a foaf:Organization ; sefazma:cnpj ?cnpj. } LIMIT 10


Query time (SLOW):

ontop_case0.PNG

Expected behavior:

  • Return ?org (Instance a foaf:Organization) and ?cnpj (literal e.g 01244599) in normal time

Actual behavior: [What actually happens]

-> The query run in a slow time +-4 or+ minutes. In queries using properties with exclusive name, e.g sefazma:cnpj_raiz (mapped by map:Organization) the query returns fast.
-> Based in a Log, the mapping map:Establishment is being considered in the query, being included with UNION ALL which should not happen since the triple pattern involves foaf:Organization, type not mapped in map:Establishment.

Reproduces how often: [Does it happen every time?]

-> It always happens

Attached material
# Organization map:Organization a rr:TripleMap; rr:logicalTable [ rr:sqlQuery """SELECT URI, CNPJ_RAIZ FROM "UFC2"."RFB_EMPRESA""""; ]; rr:subjectMap [ rr:column "URI"; rr:class foaf:Organization; ]; rr:predicateObjectMap [ rr:predicate sefazma:cnpj, sefazma:cnpj_raiz; rr:objectMap [ rr:column "CNPJ_RAIZ"; rr:datatype xsd:string; ]; ]. #Establishment map:Establishment a rr:TripleMap; rr:logicalTable [ rr:sqlQuery """SELECT URI, CNPJ FROM "UFC2"."RFB_ESTABELECIMENTO""""; ]; rr:subjectMap [ rr:column "URI"; rr:class sefazma:Estabelecimento; ]; rr:predicateObjectMap [ rr:predicate sefazma:cnpj, sefazma:cnpj_completo; rr:objectMap [ rr:constant "CNPJ"; rr:datatype xsd:string ]; ].

Info: URI is a single-index column that has full URI values (to improve response time in queries of type ?p?o.) [ This is not the case with the example of this issue]

Versions

Ontop Version: 4.1.1

Additional Information

Query Log:

> {"@timestamp":"2021-12-03T12:23:15.395-03:00", > "message":"query:reformulated","application":"Ontop", > "payload":{"queryId":"c1d7a638-2641-47cb-abb8-0803291b064a", > "classesUsedInQuery":["http://xmlns.com/foaf/0.1/Organization"], > "propertiesUsedInQuery":["http://www.sefaz.ma.gov.br/ontology/cnpj"], > "tables":[ > "(SELECT URI, CNPJ_RAIZ FROM \"UFC2\".\"RFB_EMPRESA\"@HOMOLOG)", > "_(SELECT URI, CNPJ FROM \"UFC2\".\"RFB_ESTABELECIMENTO\"@HOMOLOG)"],_ > "reformulationDuration":194, > "reformulationCacheHit":false, > "httpHeaders":{}, > "sparqlQuery": > "PREFIX foaf: <http://xmlns.com/foaf/0.1/> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > PREFIX sefazma: <http://www.sefaz.ma.gov.br/ontology/> > SELECT * WHERE { > ?org a foaf:Organization. > sefazma:cnpj ?cnpj. > } LIMIT 10 " > } > } > "reformulatedQuery":"ans1(org,cnpj) > CONSTRUCT [org, cnpj] [org/RDF(v0,IRI), cnpj/RDF(v1m2,xsd:string)] > NATIVE [v0, v1m2] > SELECT DISTINCT TO_CHAR(V1.URI) AS \"v0\", V6.\"v1m2\" AS \"v1m2\" > FROM (SELECT URI, CNPJ_RAIZ FROM \"UFC2\".\"RFB_EMPRESA\"@HOMOLOG) V1, > > (SELECT V2.URI AS \"URI0m2\", TO_CHAR(V2.CNPJ_RAIZ) AS \"v1m2\" > FROM (SELECT URI, CNPJ_RAIZ FROM \"UFC2\".\"RFB_EMPRESA\"@HOMOLOG) V2 > WHERE (V2.URI IS NOT NULL AND V2.CNPJ_RAIZ IS NOT NULL)

UNION ALL
SELECT V4.URI AS "URI0m2", 'CNPJ' AS "v1m2"
FROM (SELECT URI, CNPJ FROM "UFC2"."RFB_ESTABELECIMENTO"@homolog) V4
WHERE V4.URI IS NOT NULL
) V6
WHERE (TO_CHAR(V1.URI) = TO_CHAR(V6."URI0m2")
 AND V1.URI IS NOT NULL)
FETCH NEXT 10 ROWS ONLY
"}}

Only properties based on the triple pattern ?org a foaf:Organization; should be mapped ? Therefore, UNION ALL should not exist

_--- Queries performed on an Endpoint Ontop (localhost)

- The same query without property (0.46 seconds):


ontop_without_properties.PNG

- Using properties with unique names, no exists problem:

example_ontop2.PNG

Obs: UNION ALL NOT Present in Query Plan

- In Relational:

relational_time.PNG

What could be the real problem???

Thanks.


Reply all
Reply to author
Forward
0 new messages