DuckDB data not updating

163 views
Skip to first unread message

Ron Basumallik

unread,
Jan 1, 2024, 11:18:06 AM1/1/24
to ontop4obda
Hello - I've been evaluating OnTop for use with DuckDB and ran into some behavior which I'm not sure is a bug or a feature.

  1. Run OnTop Endpoint connected to DuckDB with driver on a Docker container. The DuckDB .db file lives on my local machine and is mounted to the Docker container where the OnTop process accesses it.
  2. I run a query in OnTop to get all the rows of the DuckDB table. It works great and returns 9 rows.
  3. In a separate local process unrelated to OnTop, I open a connection to the same DuckDB database and add a row to the DuckDB database. I commit and close the connection.
  4. In another separate local process unrelated to OnTop, I open a connection to the DuckDB database and query all rows, it returns 10 rows including the row I added in step 3.
  5. In OnTop, I run the same query as step 2, but it only returns 9 rows

My question is is this expected behavior? My understanding was the OnTop will query DuckDB every time so it should show the updated 10 rows. I looked into caching behavior but get the same 9 row result when I open OnTop in multiple browsers so it's not caching on the client side. When I directly query DuckDB I get 10 rows so it's not caching on the DuckDB side. I updated OnTop cache query size to 0 so it would reformulate every time and still get 9 rows.

Is there any other sort of internal caching mechanism on OnTop which may be causing this behavior? Additionally does OnTop possibly create a single connection with DuckDB throughout its entire running process or only when it needs to run a query?

```ontop.properties
jdbc.url = jdbc:duckdb:/opt/ontop/data/instance/tasks.db
jdbc.driver = org.duckdb.DuckDBDriver
ontop.cache.query.size=0
```

My understanding of OnTop was that I could update the underlying Data Sources in different processes without having to restart OnTop every time. Let me know if I'm incorrect in this assumption.

Thank you for taking the time to review my question and looking forward to working with OnTop.

Best,
- Ron Basumallik

Ron Basumallik

unread,
Jan 1, 2024, 11:38:26 AM1/1/24
to ontop4obda
I've attached the query logs for the first and second request. In between the 2 requests, I added a row to the DuckDB table. Just based on the totalDuration of 140 ->15, some sort of caching is going on. 
log.json
log2.json

Benjamin Cogrel

unread,
Jan 5, 2024, 9:59:37 AM1/5/24
to Ron Basumallik, ontop4obda
Hi Ron,

Thank you for sharing your experience with Ontop and DuckDB.

There is no data caching in Ontop, it is all on the DuckDB side. Ontop uses a connection pool (Tomcat by default) to keep some connections open (2 by default) and I suspect that DuckDB keeps some cache within the JDBC connection.
DuckDB has a very peculiar concurrency model, which makes it behave very differently from transactional databases: https://duckdb.org/faq.html#how-does-duckdb-handle-concurrency .

I think your current setup can be interpreted as a form of read-write multi-process concurrency, which is not supported by DuckDB.

Let's see if disabling the connection pool could address your issue. Could you please try to add the following entry to your properties file?
it.unibz.inf.ontop.answering.connection.pool.JDBCConnectionPool = it.unibz.inf.ontop.answering.connection.pool.impl.DummyJDBCConnectionPool

Best,
Benjamin

--
Please follow our guidelines on how to report a bug https://ontop-vkg.org/community/contributing/bug-report
---
You received this message because you are subscribed to the Google Groups "ontop4obda" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontop4obda+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontop4obda/7919a245-4c6c-4cc4-8a9c-feec1bc1baefn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages