Google Groups Home
Help | Sign in
Following traditional transaction model
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Doug Judd  
View profile
 More options Apr 6, 2:27 am
From: "Doug Judd" <d...@zvents.com>
Date: Sat, 5 Apr 2008 23:27:09 -0700
Local: Sun, Apr 6 2008 2:27 am
Subject: Following traditional transaction model

I'm reading through the book, "Transaction Processing: Concepts and
Techniques", by Jim Gray and Andreas Reuter.  What we're trying to do here
with the MetaLog is implement a form of transactions.  They describe
transactions as a sequence of actions bracketed by a Begin-Commit or
Begin-Rollback.  They also describe how transactions can declare *savepoints
* - points within the transaction execution to which the application can
later roll back.  Here's a diagram they include in the book that illustrates
a partially rolled back transaction:

http://www.hypertable.org/PartialRollbackTransaction.jpg

This looks an awful lot like what we're trying to accomplish with the splits
and the other "transactions".  Maybe we should adopt the well-established
language that's been in use in the transaction processing field for many
years.  So for example, the split "transaction" might look like:

BEGIN
SAVEPOINT(split_point, split_log)
SAVEPOINT(shrunk)
COMMIT

or something along those lines.  It's mainly just a change in language, but
it might make it easier for newcomers to the project to understand better
what's going on.

- Doug


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Judd  
View profile
 More options Apr 6, 12:54 pm
From: Doug Judd <nuggetwh...@gmail.com>
Date: Sun, 6 Apr 2008 09:54:56 -0700 (PDT)
Local: Sun, Apr 6 2008 12:54 pm
Subject: Re: Following traditional transaction model
Taking it a step further, we could add ACTIONS into the log for
capturing the state changes.  For example:

BEGIN
ACTION start-split
ACTION split-point-chosen( <split-point> )
ACTION split-log-installed ( <split-log> )
SAVE
ACTION split-shrunk
ACTION range-created
SAVE
COMMIT

Also, the API should allow you to write multiple entries in a single
call.  So in the above example, there would be just the three

following MetaLog writes:

BEGIN
ACTION start-split
ACTION split-point-chosen( <split-point> )
ACTION split-log-installed ( <split-log> )
SAVE

followed by ...

ACTION split-shrunk
ACTION range-created
SAVE

followed by ...

COMMIT

The RangeStateInfo could be modified slightly to hold the RangeSpec
and a vector of log entries that represents all of the log entries for
any outstanding uncommitted transactions.  For example,

class RangeStateInfo {
  RangeSpec  range;
  vector<LogEntry>  transaction_state;

};

If all transactions on the Range have committed, then the
transaction_state vector would be empty.

- Doug

On Apr 5, 11:27 pm, "Doug Judd" <d...@zvents.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Luke  
View profile
 More options Apr 6, 4:29 pm
From: Luke <vic...@gmail.com>
Date: Sun, 6 Apr 2008 13:29:43 -0700 (PDT)
Local: Sun, Apr 6 2008 4:29 pm
Subject: Re: Following traditional transaction model
Well, our metalogs are not really transaction logs (especially range
server meta logs) as they persist permanent states as well. Using the
terminology actually confuses the exact purpose of metalog. However
the approach/concept is very appropriate when we prepare both commit
log and meta log for general application level transactions. I think
we should design a more general purpose transaction log, when we
actually start to work on a distributed transaction server.

Some comments inline:

On Apr 6, 9:54 am, Doug Judd <nuggetwh...@gmail.com> wrote:

> Taking it a step further, we could add ACTIONS into the log for
> capturing the state changes.  For example:

> BEGIN
> ACTION start-split
> ACTION split-point-chosen( <split-point> )
> ACTION split-log-installed ( <split-log> )
> SAVE
> ACTION split-shrunk
> ACTION range-created
> SAVE
> COMMIT

For real transaction log, you'll need a unique transaction id for each
transaction and have all the action save points associated with it.
Otherwise, you cannot interleave transaction which is a severe
restriction. For this particular application (range split), what's the
advantage of this format over the current:

split-start(range, split-point, split-log)
split-shrunk(range)
split-done(range)

Which supports interleave splits for different range as well. The
overhead to support general transaction (atomically generated uuid
etc.) doesn't seem to worth it for now.

Again, this is conceptually fine but doesn't work as a real log format
as it doesn't support interleaving transactions.

> The RangeStateInfo could be modified slightly to hold the RangeSpec
> and a vector of log entries that represents all of the log entries for
> any outstanding uncommitted transactions.  For example,

> class RangeStateInfo {
>   RangeSpec  range;
>   vector<LogEntry>  transaction_state;

> };

> If all transactions on the Range have committed, then the
> transaction_state vector would be empty.

Well, this is the pretty much the same API we agreed upon last week
though:

struct RangeStateInfo {
  RangeSpec range;
  vector<MetaLogEntryPtr> log_entries; // could be renamed to
txn_state to reflect the semantics described (empty when commited)

}

Since MetaLogEntry is an abstract interface with no app specific
logic, it could be extended to support transactions when we have a
MetaLogTxnEntry implementation in the future.

__Luke


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Judd  
View profile
 More options Apr 6, 5:43 pm
From: "Doug Judd" <d...@zvents.com>
Date: Sun, 6 Apr 2008 14:43:04 -0700
Local: Sun, Apr 6 2008 5:43 pm
Subject: Re: [hypertable-dev] Re: Following traditional transaction model

Comments inline ...

On Sun, Apr 6, 2008 at 1:29 PM, Luke <vic...@gmail.com> wrote:

> Well, our metalogs are not really transaction logs (especially range
> server meta logs) as they persist permanent states as well. Using the
> terminology actually confuses the exact purpose of metalog. However
> the approach/concept is very appropriate when we prepare both commit
> log and meta log for general application level transactions. I think
> we should design a more general purpose transaction log, when we
> actually start to work on a distributed transaction server.

I would argue that the split process is very much a transaction.  It's a
long sequence of operations that can fail at any point along the way.  We
use the log to keep track of when the transaction started, how far along
it's gotten, and when it committed.  We use this for recovery purposes which
is the job of a transaction log.  I think the savepoint analogy was not
exactly correct since we don't allow them to be rolled back.  A more
appropriate description would be nested transactions.  There is a top level
"split-transaction" that has nested within it three subtransaction:
"transfer-log-installation", "range-shrunk", and "master-notified'.  I think
this terminology is inline with traditional usage.

As far as using the log for persisting permanent states goes, this is also
inline with historical usage.  Here's an excerpt from the Log Manager
chapter of the Transaction Processing book:

"The log was originally used only for transaction recovery.  Increasingly,
it is being used for broader functions. [...]"

[...]

> For real transaction log, you'll need a unique transaction id for each
> transaction and have all the action save points associated with it.
> Otherwise, you cannot interleave transaction which is a severe
> restriction. For this particular application (range split), what's the
> advantage of this format over the current:
> [...]

Support for generating unique IDs already exists in our system with the
atomic_t type.  Generating unique transaction IDs would be simple and would
allow for interleaved transactions.

The design that we've come up with is solid and is functionally equivalent
to the more typical transaction model.  However, our solution feels very
"homegrown" to me.  The whole field of Transaction Processing was developed
to solve exactly the kind of problem that we're trying to solve.

Having said all that, if you want to roll forward with our current plan and
revisit later, that's perfectly fine with me.

- Doug


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Luke  
View profile
 More options Apr 6, 6:19 pm
From: Luke <vic...@gmail.com>
Date: Sun, 6 Apr 2008 15:19:42 -0700 (PDT)
Local: Sun, Apr 6 2008 6:19 pm
Subject: Re: Following traditional transaction model
I read the book long time ago. I forgot many of the specifics, but the
metalog stuff I proposed is influenced by it.

More detailed comments inline:

On Apr 6, 2:43 pm, "Doug Judd" <d...@zvents.com> wrote:

No disagreement here. Metalog can be indeed viewed as a special
transaction log in this perspective.

> [...]

> > For real transaction log, you'll need a unique transaction id for each
> > transaction and have all the action save points associated with it.
> > Otherwise, you cannot interleave transaction which is a severe
> > restriction. For this particular application (range split), what's the
> > advantage of this format over the current:
> > [...]

> Support for generating unique IDs already exists in our system with the
> atomic_t type.  Generating unique transaction IDs would be simple and would
> allow for interleaved transactions.

It's harder than that. Because of the persistent states, the unique
transaction id needs to survive the crash as well.

OTOH, I guess a sha1/md5 hash with the transaction type + the hires
time + random number is probably OK though.

> The design that we've come up with is solid and is functionally equivalent
> to the more typical transaction model.  However, our solution feels very
> "homegrown" to me.  The whole field of Transaction Processing was developed
> to solve exactly the kind of problem that we're trying to solve.

Hey, why can't we just thought of our solution as a valid variation
that's influenced by the traditional design? :)

> Having said all that, if you want to roll forward with our current plan and
> revisit later, that's perfectly fine with me.

Well, the API is pretty much exactly the same.  I'll change the name
log_entries to txn_states to reflect the purpose.

The on disk format will probably go through a few iterations anyway.

I've been thinking about a general purpose transaction that uses a
redundant hyperspace based on berkeley db, which has all the goodies
to support XA compliant transactions, as I don't trust our potentially
varying underlying DFS to handle the transaction logs consistently.

__Luke


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Doug Judd  
View profile
 More options Apr 7, 12:07 am
From: Doug Judd <nuggetwh...@gmail.com>
Date: Sun, 6 Apr 2008 21:07:55 -0700 (PDT)
Local: Mon, Apr 7 2008 12:07 am
Subject: Re: Following traditional transaction model
This sounds good to me.  I do like the idea of renaming log_entries to
txn_states.  It will help make it clear that the range was in the
middle of a "transaction" that needs restarting.

- Doug

On Apr 6, 3:19 pm, Luke <vic...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google