Following traditional transaction model

2 views
Skip to first unread message

Doug Judd

unread,
Apr 6, 2008, 2:27:09 AM4/6/08
to hyperta...@googlegroups.com
I'm reading through the book, "Transaction Processing: Concepts and Techniques", by Jim Gray and Andreas Reuter.  What we're trying to do here with the MetaLog is implement a form of transactions.  They describe transactions as a sequence of actions bracketed by a Begin-Commit or Begin-Rollback.  They also describe how transactions can declare savepoints - points within the transaction execution to which the application can later roll back.  Here's a diagram they include in the book that illustrates a partially rolled back transaction:

http://www.hypertable.org/PartialRollbackTransaction.jpg

This looks an awful lot like what we're trying to accomplish with the splits and the other "transactions".  Maybe we should adopt the well-established language that's been in use in the transaction processing field for many years.  So for example, the split "transaction" might look like:

BEGIN
SAVEPOINT(split_point, split_log)
SAVEPOINT(shrunk)
COMMIT

or something along those lines.  It's mainly just a change in language, but it might make it easier for newcomers to the project to understand better what's going on.

- Doug

Doug Judd

unread,
Apr 6, 2008, 12:54:56 PM4/6/08
to Hypertable Development
Taking it a step further, we could add ACTIONS into the log for
capturing the state changes. For example:

BEGIN
ACTION start-split
ACTION split-point-chosen( <split-point> )
ACTION split-log-installed ( <split-log> )
SAVE
ACTION split-shrunk
ACTION range-created
SAVE
COMMIT

Also, the API should allow you to write multiple entries in a single
call. So in the above example, there would be just the three
following MetaLog writes:

BEGIN
ACTION start-split
ACTION split-point-chosen( <split-point> )
ACTION split-log-installed ( <split-log> )
SAVE

followed by ...

ACTION split-shrunk
ACTION range-created
SAVE

followed by ...

COMMIT

The RangeStateInfo could be modified slightly to hold the RangeSpec
and a vector of log entries that represents all of the log entries for
any outstanding uncommitted transactions. For example,

class RangeStateInfo {
RangeSpec range;
vector<LogEntry> transaction_state;
};

If all transactions on the Range have committed, then the
transaction_state vector would be empty.

- Doug

On Apr 5, 11:27 pm, "Doug Judd" <d...@zvents.com> wrote:
> I'm reading through the book, "Transaction Processing: Concepts and
> Techniques", by Jim Gray and Andreas Reuter. What we're trying to do here
> with the MetaLog is implement a form of transactions. They describe
> transactions as a sequence of actions bracketed by a Begin-Commit or
> Begin-Rollback. They also describe how transactions can declare *savepoints
> * - points within the transaction execution to which the application can

Luke

unread,
Apr 6, 2008, 4:29:43 PM4/6/08
to Hypertable Development
Well, our metalogs are not really transaction logs (especially range
server meta logs) as they persist permanent states as well. Using the
terminology actually confuses the exact purpose of metalog. However
the approach/concept is very appropriate when we prepare both commit
log and meta log for general application level transactions. I think
we should design a more general purpose transaction log, when we
actually start to work on a distributed transaction server.

Some comments inline:

On Apr 6, 9:54 am, Doug Judd <nuggetwh...@gmail.com> wrote:
> Taking it a step further, we could add ACTIONS into the log for
> capturing the state changes. For example:
>
> BEGIN
> ACTION start-split
> ACTION split-point-chosen( <split-point> )
> ACTION split-log-installed ( <split-log> )
> SAVE
> ACTION split-shrunk
> ACTION range-created
> SAVE
> COMMIT

For real transaction log, you'll need a unique transaction id for each
transaction and have all the action save points associated with it.
Otherwise, you cannot interleave transaction which is a severe
restriction. For this particular application (range split), what's the
advantage of this format over the current:

split-start(range, split-point, split-log)
split-shrunk(range)
split-done(range)

Which supports interleave splits for different range as well. The
overhead to support general transaction (atomically generated uuid
etc.) doesn't seem to worth it for now.

>
> Also, the API should allow you to write multiple entries in a single
> call. So in the above example, there would be just the three
>
> following MetaLog writes:
>
> BEGIN
> ACTION start-split
> ACTION split-point-chosen( <split-point> )
> ACTION split-log-installed ( <split-log> )
> SAVE
>
> followed by ...
>
> ACTION split-shrunk
> ACTION range-created
> SAVE
>
> followed by ...
>
> COMMIT

Again, this is conceptually fine but doesn't work as a real log format
as it doesn't support interleaving transactions.

> The RangeStateInfo could be modified slightly to hold the RangeSpec
> and a vector of log entries that represents all of the log entries for
> any outstanding uncommitted transactions. For example,
>
> class RangeStateInfo {
> RangeSpec range;
> vector<LogEntry> transaction_state;
>
> };
>
> If all transactions on the Range have committed, then the
> transaction_state vector would be empty.

Well, this is the pretty much the same API we agreed upon last week
though:

struct RangeStateInfo {
RangeSpec range;
vector<MetaLogEntryPtr> log_entries; // could be renamed to
txn_state to reflect the semantics described (empty when commited)
}

Since MetaLogEntry is an abstract interface with no app specific
logic, it could be extended to support transactions when we have a
MetaLogTxnEntry implementation in the future.

__Luke

Doug Judd

unread,
Apr 6, 2008, 5:43:04 PM4/6/08
to hyperta...@googlegroups.com
Comments inline ...

On Sun, Apr 6, 2008 at 1:29 PM, Luke <vic...@gmail.com> wrote:

Well, our metalogs are not really transaction logs (especially range
server meta logs) as they persist permanent states as well. Using the
terminology actually confuses the exact purpose of metalog. However
the approach/concept is very appropriate when we prepare both commit
log and meta log for general application level transactions. I think
we should design a more general purpose transaction log, when we
actually start to work on a distributed transaction server.

I would argue that the split process is very much a transaction.  It's a long sequence of operations that can fail at any point along the way.  We use the log to keep track of when the transaction started, how far along it's gotten, and when it committed.  We use this for recovery purposes which is the job of a transaction log.  I think the savepoint analogy was not exactly correct since we don't allow them to be rolled back.  A more appropriate description would be nested transactions.  There is a top level "split-transaction" that has nested within it three subtransaction: "transfer-log-installation", "range-shrunk", and "master-notified'.  I think this terminology is inline with traditional usage.

As far as using the log for persisting permanent states goes, this is also inline with historical usage.  Here's an excerpt from the Log Manager chapter of the Transaction Processing book:

"The log was originally used only for transaction recovery.  Increasingly, it is being used for broader functions. [...]"

[...]
For real transaction log, you'll need a unique transaction id for each
transaction and have all the action save points associated with it.
Otherwise, you cannot interleave transaction which is a severe
restriction. For this particular application (range split), what's the
advantage of this format over the current:
[...]

Support for generating unique IDs already exists in our system with the atomic_t type.  Generating unique transaction IDs would be simple and would allow for interleaved transactions.

The design that we've come up with is solid and is functionally equivalent to the more typical transaction model.  However, our solution feels very "homegrown" to me.  The whole field of Transaction Processing was developed to solve exactly the kind of problem that we're trying to solve.

Having said all that, if you want to roll forward with our current plan and revisit later, that's perfectly fine with me.

- Doug

Luke

unread,
Apr 6, 2008, 6:19:42 PM4/6/08
to Hypertable Development
I read the book long time ago. I forgot many of the specifics, but the
metalog stuff I proposed is influenced by it.

More detailed comments inline:
No disagreement here. Metalog can be indeed viewed as a special
transaction log in this perspective.

>
> [...]
>
> > For real transaction log, you'll need a unique transaction id for each
> > transaction and have all the action save points associated with it.
> > Otherwise, you cannot interleave transaction which is a severe
> > restriction. For this particular application (range split), what's the
> > advantage of this format over the current:
> > [...]
>
> Support for generating unique IDs already exists in our system with the
> atomic_t type. Generating unique transaction IDs would be simple and would
> allow for interleaved transactions.

It's harder than that. Because of the persistent states, the unique
transaction id needs to survive the crash as well.

OTOH, I guess a sha1/md5 hash with the transaction type + the hires
time + random number is probably OK though.

> The design that we've come up with is solid and is functionally equivalent
> to the more typical transaction model. However, our solution feels very
> "homegrown" to me. The whole field of Transaction Processing was developed
> to solve exactly the kind of problem that we're trying to solve.

Hey, why can't we just thought of our solution as a valid variation
that's influenced by the traditional design? :)

> Having said all that, if you want to roll forward with our current plan and
> revisit later, that's perfectly fine with me.

Well, the API is pretty much exactly the same. I'll change the name
log_entries to txn_states to reflect the purpose.

The on disk format will probably go through a few iterations anyway.

I've been thinking about a general purpose transaction that uses a
redundant hyperspace based on berkeley db, which has all the goodies
to support XA compliant transactions, as I don't trust our potentially
varying underlying DFS to handle the transaction logs consistently.

__Luke

Doug Judd

unread,
Apr 7, 2008, 12:07:55 AM4/7/08
to Hypertable Development
This sounds good to me. I do like the idea of renaming log_entries to
txn_states. It will help make it clear that the range was in the
middle of a "transaction" that needs restarting.

- Doug
Reply all
Reply to author
Forward
0 new messages