I think it is a little more than allowing to create a TxContext from an existing transaction. This may be a little verbose, but I am just putting down my line of thoughts:
TransactionContext has a method named start() which:
-
starts a new transaction
-
calls startTx() on TransactionAwares
and finish() which:
-
checks for conflicts
-
calls private method persist() to make all participating TransactionAwares flush their changes
-
attempts to commit
-
calls postTxCommit() on all TransactionAwares
This is really centered around short transactions that run inside a single process and manage the entire lifecycle using the TransactionContext.
But for a long-running transaction, it is more likely that multiple separate processes participate in the transaction (for example in M/R, the mappers and reducers). For a mapper to participate in the transaction, it must:
-
call startTx() on all TransactionAwares before joining, and
-
persist all changes in the TransactionAwares by calling commitTx() on each of them - which is actually misnamed, shouldn't is be persist()? - before leaving.
But if persisting fails for any single one of them, the transaction must aborted across all other processes that participate. That requires coordination across all participants, and I don't see that this can be done with TransactionContext.
We can make TransactionContext support this case by adding a constructor that takes an existing transaction and a set of TransactionAwares, and will call startTx() on of them. It will also have to expose a flush() method (or you name it) that does a little less than the current persist() - it should call commitTx() on all TxAwares, but not abort the transaction if any of them fails.
You also want to make sure if the constructor with the existing transaction is used, then neither start() nor finish() or abort() may be called, because this TransactionContext does not "own" the transaction - or does it? Would we want to allow a mapper to commit or abort the transaction? More likely we want to prevent that, and rather depend on the coordination layer to communicate the failure to the process that "owns" the transaction, so that it can commit or abort it.
But now we get to a point where a mapper can only use the new constructor and the new finish() method, so the interface used by a mapper is disjoint from the interface used for short transactions. If they don't share any methods, we might as well define a separate class for the mapper to use. Maybe call it TransactionParticipantContext to indicate that this for someone who participates in the transaction but does not control its lifecycle? Or maybe you know a better name...
Does this make sense?
|