Hi,
We do a lot of transaction processing, and we're following OpenTracing and Observability closely as it matures. A lot of our workloads involve transaction exchange with business partners.
We get a lot of utility in measuring performance and watching for transaction failures. However, when transactions fail, we often need transaction replay-- the ability to re-send a transaction after a failure.
I know what you're thinking. We use transaction managers and robust solutions ( mule/camel) for processing retry. We do not need replay to recover from transient processing failures. Replay generally becomes necessary in these situations:
(1) one of our business partners has a problem, and request that we re-send transaction to them
(2) a transaction fails because of a data issue, which must be fixed by a human before the transaction is retried.
Replay is strongly connected to observability, because the need to replay is discovered by transaction failures.
We currently use DataDog, and were it not for reply requirements, using OpenTracing plus DataDog would be a no-brainer. When you add the reply requirement ( select a failed transaction and re-process it), Datadog can't handle this. I hate to have to build our own UI for dashboarding and search, just to handle this small requirement.
How, if at all, does OpenTracing contemplate replay requirements?