Hi -
Events are emitted by the source connector is commit order, not in
individual execution order. This is because Oracle stores
transactions in an interweaved pattern. Let's take the following
example:
Transaction 1 starts
Transaction 1 modifies TABA row with primary key 1
Transaction 2 starts
Transaction 2 modifies TABA row with primary key 2
Transaction 1 modifies TABA row with primary key 3
Transaction 2 commits
Transaction 1 commits
With the topic having a single partition, events would be consumed
as: PK2, PK1, PK3. This is because of transaction commit order,
not individual transaction order. Two concurrent transactions
cannot modify the same row, so there is no concern here with the
same primary key due to how row-level locking works.
If we shit to your multi-partition scenario, these events could be
consumed by the sink in any order, and it all depends on which
partitions the events are stored in. If all events were stored in
the same partition, then you get the same exact behavior as I
described above with a single partition; however, if they're
distributed across partitions the the order is not guaranteed by
Kafka and they can arrive at the consumer in any order, and even
being out of order within the context of the same transaction. The
only guarantee in this case is that events for the same primary
key will always be consumed in dispatch order, and due to Oracle
row-level locking, that is transaction commit order.
One added note, SCN is not unique, and within the context of two
transactions, you can see changes for different rows of a table
come in where the SCN moves forward and backward for individual
inserts, updates, or deletes. However, for the same primary key,
the SCN should always be the same or increasing by definition of
the source table row-level locking.
Are you performing any streaming processing on the data set before
writing to RedShift where you base any logic on the SCN?
-cc