Ontop bootstrap doesn't use specified schema

82 views
Skip to first unread message

André Teege

unread,
Jul 10, 2024, 1:33:16 PM7/10/24
to ontop4obda
Hi there,

I want to use Ontop bootstrap to automatically generate an ontology and mapping file from a database schema (Snowflake).

However, the process results in an ontology and mapping of all databases and schemas. I am using the "--db-url" parameter to provide the schema information, something like
./ontop bootstrap \
--base-iri ${BASE_IRI} \
--mapping ${MAPPING_FILE} \
--ontology ${ONTOLOGY_FILE} \
--db-driver ${DB_DRIVER} \
--db-url "jdbc:snowflake://${ACCOUNT}.snowflakecomputing.com/?user=${USERNAME}&role=${ROLE}&warehouse=${WAREHOUSE}&db=${DATABASE}&schema=${SCHEMA}&private_key_file=${PRIVATE_KEY_PATH}&private_key_file_pwd=${PRIVATE_KEY_PASSPHRASE}"

Since the process finishes, it seems to be fine, including authentication via pk file, but for some reason the selected schema is not considered (which takes ages and bloats the ontology). My parameters work in other scenarios like SnowSQL CLI.

I also tried specifying the parameters via the properties file but it didn't work either. What could be the problem?

Greetings, André

Benjamin Cogrel

unread,
Jul 12, 2024, 3:53:37 AM7/12/24
to André Teege, ontop4obda
Hi André,

Thanks for sharing this issue.

My understanding is that we are missing a feature at the bootstrapper level to only generate classes and properties for a given schema. This should be passed as a new parameter of the bootstrapper.

I think the current behavior is correct, as selecting a schema should probably not prevent you from querying tables outside of the schema (unless I misinterpreted it). So, from a DB metadata extraction perspective, we need to consider all the tables of the database, not just the ones of the selected schema.

If we agree on that, let's create a new feature request issue on the Github tracker.

Best,
Benjamin

--
Please follow our guidelines on how to report a bug https://ontop-vkg.org/community/contributing/bug-report
---
You received this message because you are subscribed to the Google Groups "ontop4obda" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontop4obda+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/33ae6244-d424-477e-b2b5-8fa08317a3d2n%40googlegroups.com.

André Teege

unread,
Jul 19, 2024, 4:24:23 AM7/19/24
to ontop4obda
Hi Benjamin,

Thanks for the quick reply. Right now, the bootstrapper creates a mapping/ontology over all databases and all schemas within these databases. That problem could also be solved by restricting the access of the database user to one required database (but can be tedious in some enterprise environments).

I agree that within one database, you need at least also the information schema besides the desired schema. In some databases, however, there are also schemas included which I am really not interested in, so additional DB and SCHEMA parameters on the bootstrapper level would be very helpful, indeed. 

BR, André

Benjamin Cogrel

unread,
Jul 22, 2024, 12:22:22 PM7/22/24
to André Teege, ontop4obda
Hi André,

Thanks for sharing your workaround.

I opened an issue to track this feature request: https://github.com/ontop/ontop/issues/817 .

Best regards,
Benjamin

Einar Clementz

unread,
Aug 12, 2024, 7:28:04 AM8/12/24
to ontop4obda

Hi.

An addition to this issue. 

I just experienced the same against Snowflake. I used a couple of rounds to get connections right.

The connection contains account, role, warehouse, database, schema, and user.

Going into the warehouse query history it show the simple command

show /* JDBC:DatabaseMetaData.getTables() */

objects in account

In my case the size of the result set is about 84 thousand, with xsmall WH it is about 1.5 minutes in duration. It downloads 12 chunks of JSON in 3 downloads. So, it iterate on each table to get columns. Each table query is fast, but the large number prohibits the functionality. 

show /* JDBC:DatabaseMetaData.getColumns() */

columns in table "<database>".">schema>"."<table>”

I have tried both with Protégé Ontop and with Ontop CLI. Same result – no result in the output.

Some way to tell Snowflake the scope/context is in need. 

 

PS. If there is some workaround anyone can think about, I am happy to use that.

 

Thanks. 

Best regards.

 

Einar. 

Einar Clementz

unread,
Aug 15, 2024, 6:59:19 AM8/15/24
to ontop4obda
Hi. 

I looked into the docs a little and tested a few commands. 
For Snowflake setup is it possible to change command to this. 

show objects 

That is the needed change. 
The snowflake jdbc connect with db=<db-name> issue a "USE database <db_name>". Equal with other properties like schema etc. 
The "in account" override any USE ... issued by the properties given. 
I am just eager to get going. 

Thanks. 
Einar. 

Benjamin Cogrel

unread,
Dec 5, 2024, 2:40:58 AM12/5/24
to Einar Clementz, ontop4obda
Hi Einar,

Have you considered assigning a role to the Snowflake JDBC connection to limit the scope exposed to Ontop?
If no role is assigned, it uses the default role of the user, which may have a wide access to tables.

Best,
Benjamin

Einar Clementz

unread,
Dec 6, 2024, 1:48:31 AM12/6/24
to ontop4obda
Hi Benjamin. 

We find a solution to handle different addition user/roles less practical and static in its nature. We are in a corporate world with rules and procedures, access stuff has many rules. 
Functionality to flexible set different scope for each use case need to be at hand. 
Snowflake connection works with using connection string parameters to se scope for session with use commands. 
Connection string - warehouse=uid_xsmall_wh&db=DBABC&schema=Afunc10 
Snowflake issue an command with content from the connection parameters, that set scope for commands. 
using "show objects" - will give only objects in scope set by Snowflake use command. 
using "show objects in account" - will disregard and override scope set by use command. 
The change to give users flexibility is to change the bootstrap command. And make ready a new version for us. Then we have flexibility to handle scope by connection string parameters. 
We hope that is possible. 

Thanks. 
Kind regards. 
Einar. 

Benjamin Cogrel

unread,
Dec 9, 2024, 1:51:49 AM12/9/24
to Einar Clementz, ontop4obda
Hi Einar,

Thanks for the feedback.

Indeed, if there are no already existing roles suitable for this purpose, it doesn't make sense creating new ones just for bootstrapping purposes.

I remain favorable to passing parameters to the bootstrapper or to the DB metadata extractor, rather than that focusing on the "active" database and schema, at least for a first implementation.
The parameter-based solution has the advantage of being dialect-independent, saving us from extending all the DB metadata extractor implementations.

But in the future it could become a new option for the bootstrapper/DB metadata extractor, that is restricting itself to the active database and schema.
If no active database is set or if Ontop is not able to extract this information for a given dialect, an exception would be thrown.

Best,
Benjamin

Reply all
Reply to author
Forward
0 new messages