"payload":{
"after":"{\"_id\" : {\"$numberLong\" : \"1004\"},\"first_name\" : \"Anne\",\"last_name\" : \"Kretchmar\",\"email\" : \"
an...@noanswer.org\"}",
"patch":null,
"source":{
"version":"2.0.0.Alpha1",
"connector":"mongodb",
"name":"fulfillment",
"ts_ms":1558965508000,
"snapshot":false,
"db":"inventory",
"rs":"rs0",
"collection":"customers",
"ord":31,
"h":1546547425148721999
},
"op":"c",
"ts_ms":1558965515240
}
IIUC, if we set data.format = RAW, we can have two options:
1. raw in the existing field
"payload":{
"after":"xxxxxxxxxx", <----- raw byte representation of the `after` field
"patch":null,
"source":{
"version":"2.0.0.Alpha1",
"connector":"mongodb",
"name":"fulfillment",
"ts_ms":1558965508000,
"snapshot":false,
"db":"inventory",
"rs":"rs0",
"collection":"customers",
"ord":31,
"h":1546547425148721999
},
"op":"c",
"ts_ms":1558965515240
}
2. raw in an extra field
"payload":{
"after":null,
"patch":null,
"source":{
"version":"2.0.0.Alpha1",
"connector":"mongodb",
"name":"fulfillment",
"ts_ms":1558965508000,
"snapshot":false,
"db":"inventory",
"rs":"rs0",
"collection":"customers",
"ord":31,
"h":1546547425148721999
},
"raw": "xxxxxxxxxxxxxxxxx". <------ raw byte representation of the document returned by MongoDB
"op":"c",
"ts_ms":1558965515240
}
I think both can be useful:
- #1 keeps the existing debezium output format while avoid some serdes with json string. However because Debezium supports field modification (e.g., exclude fields), it makes a document needs to remain modifiable -- i.e., we cannot take advantage of the bytebuffer interface from RawBsonDocument. Converting a bson.Document to bytes is better than json string but also quite expensive that involves a codec.
- #2 is a step further, requiring consumer to work with the bytes of Mongo's Change Event document directly, I can imagine some advanced Mongo user might want to interact with a immutable version of document and just pass it through. This setup we can take advantage of RawBsonDocument and provide a high performant connector for raw type use cases.
IMHO, I think data.type = raw fits #1 better whereas a new `enable_raw` (or some other name) flag fits #2 better. WDYT?