SQL ETL - On update INSERT-query hits DB before DELETE

43 views
Skip to first unread message

Anton Möller

unread,
Oct 12, 2021, 4:44:16 AM10/12/21
to RavenDB - an awesome database
Hi!

I'm having issues with SQL ETL to a Postgres DB where when a document is updated, RavenDB sends off the INSERT-query to the DB before the DELETE-query has hit, causing both the existing record and new record to be deleted.

For example:
  1. Document/1 is in DB,
  2. Document/1 gets updated in RavenDB
  3. INSERT INTO ... VALUES (Document/1) gets executed
  4. DELETE WHERE (Document/1) gets executed
whereas the correct order of operations would be to execute the delete well before the insert (as the delete simply looks for the ID column).

With this approach, there's no data integrity guaranteed at all...

Oren Eini (Ayende Rahien)

unread,
Oct 12, 2021, 7:23:49 AM10/12/21
to ravendb

--
You received this message because you are subscribed to the Google Groups "RavenDB - an awesome database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/affd815b-e4a4-4cf0-bcb0-bc07a1f60d0bn%40googlegroups.com.


--
Oren Eini
CEO   /   Hibernating Rhinos LTD
Skype:  ayenderahien
Support:  sup...@ravendb.net
  

Anton Möller

unread,
Oct 13, 2021, 2:20:01 AM10/13/21
to RavenDB - an awesome database
...and one final point in case, here are RavenDBs own internal logs showing the INSERT being sent before the DELETE:
2021-10-12T19:05:09.2716797Z, 57, Information, database, Raven.Server.Documents.ETL.Providers.SQL.RelationalWriters.RelationalDatabaseWriter, Insert took: 22ms, statement: INSERT INTO r_test (id, test) VALUES (@id, @test)

2021-10-12T19:05:09.2717588Z, 57, Information, database, Raven.Server.Documents.ETL.Providers.SQL.SqlEtl, [RavenDB to MariaDB/Script_4] Inserted 1 (out of 1) records to 'r_test' table from the following documents: Test/abc

2021-10-12T19:05:09.2943338Z, 57, Information, database, Raven.Server.Documents.ETL.Providers.SQL.RelationalWriters.RelationalDatabaseWriter, Delete took: 22ms, statement: DELETE FROM r_test WHERE id IN ('Test/abc')

2021-10-12T19:05:09.2943807Z, 57, Information, database, Raven.Server.Documents.ETL.Providers.SQL.SqlEtl, [RavenDB to MariaDB/Script_4] Deleted 1 (out of 1) records from 'r_test' table for the following documents: Test/abc

2021-10-12T19:05:09.3195366Z, 57, Information, database, Raven.Server.Documents.ETL.Providers.SQL.SqlEtl, SQL ETL process 'RavenDB to MariaDB/Script_4' processed the following number of items: Document - 1 items, last transformed etag: 48848648 in 00:00:00.1607751 (last loaded etag: 48848648) Batch completion reason: No items to process

2021-10-12T19:05:09.3228967Z, 24, Information, A, Raven.Server.Rachis.RachisConsensus, Apply of UpdateEtlProcessStateCommand with index 128250 was successful.

Anton Möller

unread,
Oct 13, 2021, 2:20:01 AM10/13/21
to RavenDB - an awesome database
Yes, I see that your code sends the commands in the right order, but I said that they are received in the wrong order on the server-side. I've tried now with multiple types of RDS (PostgreSQL / MySQL), and they both receive the statements in the wrong order. 

Here is 1 of many excerpts from our log clearly showing the order commands were executed:
2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: statement: DISCARD ALL
2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: INSERT INTO r_tab ("id") VALUES ($1)
2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:DETAIL: parameters: $1 = 'Tab/57708-A'
2021-10-12 13:04:19 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: DELETE FROM r_tab WHERE "id" IN ('Tab/57708-A')
2021-10-12 13:04:19 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: statement: COMMIT

To entirely avoid this, isn't it possible to send the DELETE in its own transaction and commit that before the INSERT is sent?



On Tuesday, 12 October 2021 at 13:23:49 UTC+2 Oren Eini wrote:

Anton Möller

unread,
Oct 13, 2021, 2:20:01 AM10/13/21
to RavenDB - an awesome database
Another log excerpt, this time from a fresh MariaDB/MySQL DB receiving the commands in the wrong order:

211012 17:55:01 18 Init DB ravendb
18 Query SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
18 Query BEGIN
18 Query INSERT INTO r_test (id, test) VALUES ('Test/abc', 'def')
18 Query DELETE FROM r_test WHERE id IN ('Test/abc')
18 Query COMMIT

On Tuesday, 12 October 2021 at 13:23:49 UTC+2 Oren Eini wrote:

Oren Eini (Ayende Rahien)

unread,
Oct 13, 2021, 6:48:10 AM10/13/21
to ravendb
What does your ETL script look like? Do you call loadTo() multiple times?

Anton Möller

unread,
Oct 13, 2021, 7:08:35 AM10/13/21
to RavenDB - an awesome database

Anton Möller

unread,
Oct 14, 2021, 7:40:34 AM10/14/21
to RavenDB - an awesome database
For reference - this issue was caused by a mismatch between actual table names in the ETL configuration page and method called:
We had named tables "r_test" in the SQL Database as well as in the "table name" section of the ETL configuration, but we called the function "loadToR_Test" instead of "loadTor_test". If your table is called "abcDEFghi" you need to call the method "loadToabcDEFghi()".

Make sure your tables and loadTo´s match (case-sensitive) exactly for functioning SQL ETL. 

Oren Eini (Ayende Rahien)

unread,
Oct 14, 2021, 8:03:55 AM10/14/21
to ravendb
Or wait a few days for the PR that fixes this to land :-)

Ryan Heath

unread,
Oct 14, 2021, 8:59:22 AM10/14/21
to rav...@googlegroups.com
Interestingly how casing could cause a change in order of execution, great find!

// Ryan

Arkadiusz Palinski

unread,
Oct 15, 2021, 3:43:21 AM10/15/21
to RavenDB - 2nd generation document database
In general we're case insensitive, but in one location we had a case sensitive dictionary with table names. We execute DELETE and INSERT statements per table. Because of the issue we had 2 separate records in the dictionary ( r_test and R_Test). That made that we didn't send DELETE for the first record (just INSERT), while we sent only DELETE for the second record. Although effectively we send it against _the same_ table.



--
Arkadiusz Paliński
Team Leader   /   Hibernating Rhinos LTD

Oren Eini (Ayende Rahien)

unread,
Oct 15, 2021, 10:11:36 AM10/15/21
to ravendb
Reply all
Reply to author
Forward
0 new messages