MySQL connector - can source.ts_ms be used as event order?

484 views
Skip to first unread message

David Ešner

unread,
Apr 23, 2024, 6:35:14 AM4/23/24
to debezium
Hi, 

 We are recently testing the MySQL connector and when I perform multiple updates on the same record the source.ts_ms returned in events has the same value. Even though they should be separate transactions. I would like to use that information to keep only the latest event on that record. Is my assumption correct that the source.ts_ms field could be used? Or should I rather use the order or emitted records?

Apologies if this is a dumb question, I tried to search everywhere with no luck.

Attaching my Debezium Server config:

debezium.source.topic.prefix=testcdc
debezium.source.offset.storage=org.apache.kafka.connect.storage.FileOffsetBackingStore
debezium.source.offset.storage.file.filename=/testing_config/state.dat
debezium.source.connector.class=io.debezium.connector.mysql.MySqlConnector
debezium.source.database.hostname=localhost
debezium.source.database.port=3306
debezium.source.database.user=root
debezium.source.database.password=rootpassword
debezium.source.snapshot.max.threads=1
debezium.source.snapshot.fetch.size=10240
debezium.source.snapshot.mode=initial
debezium.source.decimal.handling.mode=string
debezium.source.time.precision.mode=connect
debezium.source.binary.handling.mode=hex
debezium.source.schema.history.internal=io.debezium.storage.file.history.FileSchemaHistory
debezium.source.schema.history.internal.file.filename=/testing_config/schema_history.jsonl
debezium.source.schema.history.internal.store.only.captured.tables.ddl=True
debezium.source.database.include.list=inventory
debezium.source.table.include.list=inventory.sales
debezium.source.errors.max.retries=3
debezium.source.signal.enabled.channels=source
debezium.source.signal.data.collection=inventory.debezium_signals
debezium.sink.type=http
debezium.sink.http.url=http://localhost:8000/log_messages
debezium.sink.http.headers.Content-Type=application/json
debezium.format.value=json
debezium.format.key=json
quarkus.log.level=TRACE
quarkus.log.console.json=false


When I run following queries the source.ts_ms is the same:
UPDATE inventory.sales
SET price = 249.99
WHERE sku = 'SKU1';

UPDATE inventory.sales
SET price = 299.99
WHERE sku = 'SKU1';

UPDATE inventory.sales
SET price = 349.99
WHERE sku = 'SKU1';

David Ešner

unread,
Apr 29, 2024, 1:21:40 AM4/29/24
to debezium
Hi, I just want to update on this in case someone deals with the similar issue. It turns out that the ts_ms timestamp doesn't necessarily need to be unique across multiple transactions like in the example above. We opted out to use the event delivery order in the topic to perform the dedupe. I found some examples where people also leverage the log position combined with the `ts_ms` timestamp like this:

````...  order by (t.record_content:ts_ms::integer*100000000000000 + t.record_content:source.pos::integer) desc nulls last)```

In our case we do something like following to keep the latest event:
SELECT *
FROM
"{table}"
QUALIFY ROW_NUMBER() OVER (PARTITION BY {id_cols_str}
ORDER BY "{order_by_column}"::BIGINT DESC) = 1


Dne úterý 23. dubna 2024 v 12:35:14 UTC+2 uživatel David Ešner napsal:

David Ešner

unread,
Jul 9, 2024, 8:34:04 AM7/9/24
to debezium
Hi I would like to get back to this. We have implemented a logic that stops the connector (debezium engine) after it receives  an event with ts_ms larger than the time of when the task started, this was aimed to stop the replication when we processed all events since the last execution. It works fine on our test instances but sometimes in production we run into cases where source.ts_ms is somehow random and seems to reflect random times most likely in a completely different timezone than the MySQL (Mariadb) default.

Here's an example, where KBC_BATCH_EVENT_ORDER is the order in which the consumer received the event from the task, `updated_at` is a last updated timestamp that comes directly from the database record, KBC_EVENT_TIMESTAMP_MS is source.ts_ms which is converted in the TIMESTAMP column from EPOCH. As you can see there are weird differences, sometimes 4hours separate even though the events are chronologically ordered (the order of updated_at matches with the KBC_BATCH_EVENT_ORDER)
event_order.png

Would you know what could be the cause of that? Is my understanding of source.ts_ms completely wrong? It is MariaDB database.

Thank you

Dne pondělí 29. dubna 2024 v 7:21:40 UTC+2 uživatel David Ešner napsal:

jiri.p...@gmail.com

unread,
Jul 10, 2024, 4:46:43 AM7/10/24
to debezium
Hi,

in this case I'd really ecommend to take a look directly into the binlog to see what timestamps are stored there. That would help is to identify whether the issue relates to MariaDB or Debezium.

Jiri

David Ešner

unread,
Jul 10, 2024, 9:46:18 AM7/10/24
to debezium
Thank you for the answer! 

I checked and this is the source object received from the connector in TRACE mode
{before=xx,after=xx,source=Struct{version=2.5.4.Final,connector=mysql,name=testcdc,ts_ms=1720600863000,db=xxx,table=sales_order,server_id=1,gtid=0-1-5227847849,file=mariadb-bin.000665,pos=53498592,row=0},op=u,ts_ms=1720614916502}

It is related to this object
row.png

The ts_ms inside the source is 1720600863000 which corresponds to  to Wednesday, July 10, 2024 08:41:03 AM GMT.

The ts_ms in the payload root is 1720614916502 = Wednesday, July 10, 2024 12:35:16.502 PM which corresponds with the update time but I suppose this is the time it was captured. But interestingly this matches also exactly the updated_at timestamp that was updated during that transaction.

This is the related binlog record, it shows `10:41:03` which is I suppose in the UTC+2 timezone of the server. So 8:41:UTC which does match the source.ts_ms. I am just utterly perplexed how this can happen, because the event was physically captured by Debezium at 12:35:16.502 UTC and also the updated_at column in that captured transaction changed from `2024-07-10T12:34:08Z` to `2024-07-10T12:35:16Z`. Would you have any idea what could be the cause of it? 



#240710 10:41:03 server id 1 end_log_pos 53497710 CRC32 0x7c6fb16b GTID 0-1-5227847849 trans
/*!100001 SET @@session.gtid_seq_no=5227847849*//*!*/;
START TRANSACTION
/*!*/;
# at 53497710
# at 53497803
#240710 10:41:03 server id 1 end_log_pos 53497803 CRC32 0xcb4d2f1a Annotate_rows:
#Q> UPDATE `sales_order` SET `picker` = '7738' WHERE (entity_id= 37161322)
#240710 10:41:03 server id 1 end_log_pos 53498592 CRC32 0x3a0b8795 Table_map: `dbxx`.`sales_order` mapped to number 42590
# at 53498592
#240710 10:41:03 server id 1 end_log_pos 53500622 CRC32 0x65bf6815 Update_rows: table id 42590 flags: STMT_END_F

BINLOG '
...
# Number of rows: 1
# at 53500622
#240710 10:41:03 server id 1 end_log_pos 53500653 CRC32 0xfcbb6b1d Xid = 129615152
COMMIT/*!*/;

Dne středa 10. července 2024 v 10:46:43 UTC+2 uživatel jiri.p...@gmail.com napsal:

David Ešner

unread,
Jul 10, 2024, 11:34:10 AM7/10/24
to debezium
Jiri, may I ask, in general is it possible that the lag between the time when the event is captured by Debezium and the actual ts_ms is as large as couple of hours? I am asking because we are trying to run the connector in batch mode, relying on the fact that the transactions made in the source will get to us almost in realtime so we are waiting for event that has source.ts_ms larger than the time when the connector started. If it's couple of minutes that's still fine, but if it's hours it may be a problem for us. Is that logic flawed and we shouldn't rely on ts_ms in that manner?

Thank you

Dne středa 10. července 2024 v 15:46:18 UTC+2 uživatel David Ešner napsal:

jiri.p...@gmail.com

unread,
Jul 11, 2024, 7:27:11 AM7/11/24
to debezium
Hi,

my only theory that would need to be proved is that it relates to the length of the transaction. COuld you please try to start a TX, insert 3 changes, then wait 10 minuts and do the commit? I'd be interested in seeing the timestamp values in that case. Generally, we need to confirm whether the timestamp is assigned at time when it is recorded to the binlog or when the change was made.

Jiri

Chris Cranford

unread,
Jul 11, 2024, 2:41:16 PM7/11/24
to debe...@googlegroups.com
Hi David -

Jiri and I were discussing this a bit more earlier today, and you mentioned that this is Debezium 2.5.4, MySQL connector communicating with MariaDB.  There was another MariaDB-related problem raised recently where a user was also using the MySQL connector and MySQL driver with MariaDB and noticed some other issues.  After configuring the connector specifically for MariaDB, the problem disappeared.

So if you could indulge us, could you do one of the following:

Option 1:
Using Debezium 2.5.4, could you modify your connector configuration as follows:

    debezium.source.jdbc.driver=org.mariadb.jdbc.Driver
    debezium.source.database.protocol=jdbc:mariadb
    debezium.source.database.adapter=mariadb

This will explicitly set the connector to use the MariaDB connector strategy and driver, that was introduced as part of Debezium 2.5.  This is actually the recommended approach for users who are connecting to MariaDB using Debezium 2.5 and 2.6.

Option 2:
Using Debezium 2.7, could you use the official MariaDB connector, modifying the configuration as follows:

    debezium.source.connector.class=io.debezium.connector.mariadb.MariaDbConnector

This uses the new official, standalone MariaDB connector designed specifically for MariaDB.

Let us know how either of these work out.

Thanks,
Chris

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/b519fcb1-2600-4d22-90cc-2779bebcc5c4n%40googlegroups.com.

David Ešner

unread,
Jul 12, 2024, 2:04:19 AM7/12/24
to debezium
Hi Chris, thank you very much for looking into this I really appreciate it. We actually already run option 1 and use the MariaDB adapter/driver. I will try to upgrade to Debezium 2.7.

The issue really seems to be that the binlog timestamps are somehow off. Debezium correctly captures the timestamp of the transaction. I tried to simulate what Jiri suggested on a different instance (couple of changes in long transaction). And the binlog records the transaction timestamp correctly at the time when I submitted the queries (not when committed). Also the default current_timestamp column is set correctly to that time. I am trying to run this test in the production database but it is more difficult as I do not own it.

Do you think that this assumption https://groups.google.com/g/debezium/c/xVpOsOXHcJA/m/K0pHzBJMAAAJ we base our connector logic is correct? And this may be just some anomaly in the MariaDB setup?

Thank you.



Dne čtvrtek 11. července 2024 v 20:41:16 UTC+2 uživatel Chris Cranford napsal:

Chris Cranford

unread,
Jul 12, 2024, 11:04:14 AM7/12/24
to debe...@googlegroups.com
Hi David -

I did not see option 1 settings in your configuration from the original post, were those just omitted?

Chris

David Ešner

unread,
Jul 13, 2024, 4:00:33 PM7/13/24
to debe...@googlegroups.com
Hi Chris, the original post was about mysql, now we used mysql with mariadb adapter. I reused this post because it didn't get an answer and it is related to the topic of how the source.ts_ms work. 

Reply all
Reply to author
Forward
0 new messages