Here is a yummy optimization patch to read before breakfast.
This patch attempts to reduce the expense of snapshot store metadata
updates for write requests by storing chunk allocation information in
the journal commit block instead of changing the allocation bitmaps,
then running a (hopefully big) bitmap flush episode each time the
journal gets about half full.
According to my calculations, this should eliminate two transfers along
with their seeks from each metadata update due to a copyout that had to
be performed in order to service a write request. The two transfers
saved are:
1) Write dirty bitmap block to journal
2) Write dirty bitmap block to bitmap
From time to time there will be an episode of bitmap updating and
associated journal updates as described above. This will be very
efficient if multiple bits happen to be located in the same or nearby
bitmap blocks, which really ought to be the case a lot of the time even
with our current, unintelligent cyclic allocation policy.
How it Works
Now, when we allocate a new chunk, the low level chunk allocator checks
the (in memory) super to see if the bitmap update can be deferfed, which
it can be if this is the first allocation in this write transaction. If
so it just writes the chunk address into the super instead of updating
the bitmap. On transaction commit, the sb is checked for any deferred
allocation, and if there is one, it is copied into the commit block,
the sb field is cleared, and the allocation (range, always 1 for now) is
recorded in a small table of deferred allocations. When the number of
deferred updates goes over half the journal size (magic number!) then
all the deferred updates are written into the bitmaps, then the bitmaps
are flushed to disk via a new journal transaction.
If the journal has to be replayed, then every commit in the journal is
scanned to update the bitmaps. (Bug here, did you spot it?)
This only defers one alloc per write commit just for now, but is ready
to handle more. Let's work with this a little then make it more
aggressive.
Untested. Can we have a look please, and feel free to throw mud at it
if anything looks borken. Probably doesn't work yet, so don't try it on
real data. Will be testing it here over the next few days before
throwing it at our stress tests.
Apply order is:
simplfy.replay.patch
lazy.bitmap.update.patch
diffstat /src/simplfy.replay.patch
ddsnapd.c | 169 +++++++++++++++++++++++++++++---------------------------------
1 file changed, 80 insertions(+), 89 deletions(-)
diffstat /src/lazy.bitmap.update.patch
ddsnapd.c | 174 ++++++++++++++++++++++++++++++++++++++------------------------
1 file changed, 109 insertions(+), 65 deletions(-)
Daniel
1) If a block is deleted before the deferred allocation list is
flushed then the block will be erroneously marked allocated later
when the flush occurs.
2) Replaying a deferred allocation that is marked free later in the
journal will erroneously leave the block marked allocated.
Hole (1) is closed by flushing the deferred list before any delete.
Hole (2) is closed by specially flagging the commit block for an
allocation flush, and only replaying deferred allocations after
such a flagged commit.
Probably a couple of bugs remaining. This optimization was harder to
implement than expected, but I expected that.
Daniel
While it is probably not possible to have a deferred allocation on the
same transaction as a deferred allocation flush in the current code, we
would like to avoid subtle surprises later as the code evolves.
This brings the patch into pretty much its final form I think. The
approach seems to hold water, and the implementation is not completely
impenetrable. Now to see if it works and what kind of gains are to be
had.
Regards,
Daniel
Regards,
Daniel
And now onto unit tests?
It is necessary to remember the deferred allocations in some way so
that the allocator does not allocate the same block twice. This can
only happen if the cyclic allocation wraps all the way around before
deferred allocations are flushed to the bitmaps. Unfortunately, this
problem is more likely to occur than it first seems, when space is
nearly exhausted, which is a normal running condition for us.
A reasonable fix is not too hard: whenever the allocator finds a free
chunk, before deciding to allocate that chunk it checks the deferred
allocations and if the chunk is in the list, continues the search.
One other small detail that emerged is the possibility to simplify
the code somewhat by not having separate variables in the superblock
to record new deferred allocations, but use the deferred chunk list
directly for that purpose.
A big detail that came up: separate allocation of data/metadata adds
significant complexity to this optimization, so the optimization will
simply be disabled in the separate data/metadata case for the time
being.
Daniel
Miscellaneous refinements including better handling of the deferred
allocation variables in superblock.
The plan is to split this (rather large) patch into three patches
for merging, when ready:
1) Cleanups
2) Payload
3) Unit tests
Daniel
Daniel