Lazy bitmap updates

Daniel Phillips

unread,

Mar 26, 2008, 5:30:50 AM3/26/08

to zuma...@googlegroups.com

Hi all,

Here is a yummy optimization patch to read before breakfast.

This patch attempts to reduce the expense of snapshot store metadata
updates for write requests by storing chunk allocation information in
the journal commit block instead of changing the allocation bitmaps,
then running a (hopefully big) bitmap flush episode each time the
journal gets about half full.

According to my calculations, this should eliminate two transfers along
with their seeks from each metadata update due to a copyout that had to
be performed in order to service a write request. The two transfers
saved are:

1) Write dirty bitmap block to journal
2) Write dirty bitmap block to bitmap

From time to time there will be an episode of bitmap updating and
associated journal updates as described above. This will be very
efficient if multiple bits happen to be located in the same or nearby
bitmap blocks, which really ought to be the case a lot of the time even
with our current, unintelligent cyclic allocation policy.

How it Works

Now, when we allocate a new chunk, the low level chunk allocator checks
the (in memory) super to see if the bitmap update can be deferfed, which
it can be if this is the first allocation in this write transaction. If
so it just writes the chunk address into the super instead of updating
the bitmap. On transaction commit, the sb is checked for any deferred
allocation, and if there is one, it is copied into the commit block,
the sb field is cleared, and the allocation (range, always 1 for now) is
recorded in a small table of deferred allocations. When the number of
deferred updates goes over half the journal size (magic number!) then
all the deferred updates are written into the bitmaps, then the bitmaps
are flushed to disk via a new journal transaction.

If the journal has to be replayed, then every commit in the journal is
scanned to update the bitmaps. (Bug here, did you spot it?)

This only defers one alloc per write commit just for now, but is ready
to handle more. Let's work with this a little then make it more
aggressive.

Untested. Can we have a look please, and feel free to throw mud at it
if anything looks borken. Probably doesn't work yet, so don't try it on
real data. Will be testing it here over the next few days before
throwing it at our stress tests.

Apply order is:
simplfy.replay.patch
lazy.bitmap.update.patch

diffstat /src/simplfy.replay.patch
ddsnapd.c | 169 +++++++++++++++++++++++++++++---------------------------------
1 file changed, 80 insertions(+), 89 deletions(-)

diffstat /src/lazy.bitmap.update.patch
ddsnapd.c | 174 ++++++++++++++++++++++++++++++++++++++------------------------
1 file changed, 109 insertions(+), 65 deletions(-)

Daniel

lazy.bitmap.update.patch

simplfy.replay.patch

Daniel Phillips

unread,

Mar 27, 2008, 3:01:33 PM3/27/08

to zuma...@googlegroups.com

This iteration of the lazy bitmap update patch closes a couple of subtle
holes in the algorithm:

1) If a block is deleted before the deferred allocation list is
flushed then the block will be erroneously marked allocated later
when the flush occurs.

2) Replaying a deferred allocation that is marked free later in the
journal will erroneously leave the block marked allocated.

Hole (1) is closed by flushing the deferred list before any delete.

Hole (2) is closed by specially flagging the commit block for an
allocation flush, and only replaying deferred allocations after
such a flagged commit.

Probably a couple of bugs remaining. This optimization was harder to
implement than expected, but I expected that.

Daniel

lazy.bitmap.update.patch

Daniel Phillips

unread,

Mar 28, 2008, 5:17:51 AM3/28/08

to zuma...@googlegroups.com

This updated patch includes a small refinement to allow a deferred
allocation to occur on the same journal transaction as a deferred
allocation flush (now called a "barrier"). The high bit of the
allocation count in a commit block is reserved to indicate that
deferred allocations were flushed at that commit, so that journal
replay will reload only allocations that were still deferred at the
time of the final commit. (Did that make sense?)

While it is probably not possible to have a deferred allocation on the
same transaction as a deferred allocation flush in the current code, we
would like to avoid subtle surprises later as the code evolves.

This brings the patch into pretty much its final form I think. The
approach seems to hold water, and the implementation is not completely
impenetrable. Now to see if it works and what kind of gains are to be
had.

Regards,

Daniel

lazy.bitmap.update.patch

Daniel Phillips

unread,

Mar 28, 2008, 6:10:14 PM3/28/08

to zuma...@googlegroups.com

Oops, sorry about that "final form" claim yesterday. Today's patch
includes a small amendment to coalesce a contiguous ranges of allocated
chunks into a single deferred alloc, which should increase the
effectiveness of the optimization measurable.

Regards,

Daniel

lazy.bitmap.update.patch

Dan Kegel

unread,

Mar 28, 2008, 6:12:52 PM3/28/08

to Daniel Phillips, zuma...@googlegroups.com

And now onto unit tests?

Daniel Phillips

unread,

Mar 29, 2008, 1:30:26 AM3/29/08

to zuma...@googlegroups.com

> effectiveness of the optimization measurably.

Same idea written more clearly.

Regards,

Daniel

lazy.bitmap.update.patch

Daniel Phillips

unread,

Apr 1, 2008, 3:52:33 PM4/1/08

to zuma...@googlegroups.com

As always, a subtle issue discovered in implementation...

It is necessary to remember the deferred allocations in some way so
that the allocator does not allocate the same block twice. This can
only happen if the cyclic allocation wraps all the way around before
deferred allocations are flushed to the bitmaps. Unfortunately, this
problem is more likely to occur than it first seems, when space is
nearly exhausted, which is a normal running condition for us.

A reasonable fix is not too hard: whenever the allocator finds a free
chunk, before deciding to allocate that chunk it checks the deferred
allocations and if the chunk is in the list, continues the search.

One other small detail that emerged is the possibility to simplify
the code somewhat by not having separate variables in the superblock
to record new deferred allocations, but use the deferred chunk list
directly for that purpose.

A big detail that came up: separate allocation of data/metadata adds
significant complexity to this optimization, so the optimization will
simply be disabled in the separate data/metadata case for the time
being.

Daniel

Daniel Phillips

unread,

Apr 2, 2008, 2:00:38 PM4/2/08

to zuma...@googlegroups.com

Double allocate hole described yesterday is closed in this patch.

Miscellaneous refinements including better handling of the deferred
allocation variables in superblock.

The plan is to split this (rather large) patch into three patches
for merging, when ready:

1) Cleanups
2) Payload
3) Unit tests

Daniel

lazy.bitmap.update.patch

Daniel Phillips

unread,

Apr 3, 2008, 12:50:31 PM4/3/08

to zuma...@googlegroups.com

Close another gaping hole: on a normal clean exit the superblock is
marked clean and the journal will not be replayed on restart. But
there may be deferred allocations recorded in journal commit blocks
so always flush deferred allocations before clean exit.