Hi Denzil.
So the screenshot labels "in kafka" appears to be an insert for
MMOID 4370932; however, on the GCP side what you showed does not
tell me whether that correlates to that insert, or its a row that
was later mutated by an update?
In databases, there are two types of storage. There is the data
that is stored in the data page, which is where most column data
is maintained. But for specific data types that consume
significant space like VARCHAR2(MAX), TEXT, CLOB, BLOB, XML,
VARBINARY, and JSON -- these are most often stored in what is
called external storage. In the Oracle connector we use the phrase
LOB and in PostgreSQL this is referred to as TOAST data.
On inserts, these column values are being persisted for the first
time, and so the change vector in the transaction log will always
have the value for these columns. This explains why your "In
Kafka" screenshot shows the value, because before is null and
after has a value (aka an insert).
But for updates and deletes, this is where this unavailable value
placeholder shows up. These LOB/TOAST columns are only ever
recorded by the database in the change vector if and only if the
column is modified by the update. If the update does not change
that externally stored value, the change vector will indicate the
field was unchanged but supplies no value for performance reasons.
In addition, if the field is modified, you will notice a value in
the after, but in the before block, it will have the unavailable
value placeholder. And much like updates, since deletes do not
mutate data, these column types generally will not ave a value in
the before block, again because external storage values are not
pre-fetched for the change vector recording.
The ReselectPostProcessor is a way to try and bridge this gap.
It's not always going to work in latent corner cases or for
deletes (unless you're using Oracle).
-cc