Proposed fix for Issue 71

3 views
Skip to first unread message

Doug Judd

unread,
May 16, 2008, 5:12:10 PM5/16/08
to hyperta...@googlegroups.com
Here's the description for Issue #71:

The way the system is currently implemented, scanners will continue to reference their underlying CellStores until the scan is completed.  However, if the CellStores get compacted and then garbaged collected by the Master, then they will get deleted out from under the lagging scanners.

Here's my proposed solution:

Inside AccessGroup::create_scanner(), the code can gather the filenames of the CellStores that is about to scan and write them into the METADATA table in the row for the range under the column Scanners:<scanner-id>.  When the scan completes, it can update that column with an empty value.  Also, when a Range is being loaded it can delete the Scanners column so that there are no leaks in case the RangeServer was killed while it had outstanding scanners.

The garbage collector can read this column to figure out which CellStores are off limits due to outstanding scanners.

Comments?

- Doug

Luke

unread,
May 16, 2008, 5:55:25 PM5/16/08
to Hypertable Development
Creating a metadata entry per read request is clearly problematic
(think about the random read case: that's 4000 scanner creations per
second). I think a more robust approach would be for the outstanding
scanners to catch read errors and reopen cellstores.

Doug Judd

unread,
May 16, 2008, 6:07:02 PM5/16/08
to hyperta...@googlegroups.com
The problem with that approach is that it introduces a sublte race condition.  Here's the scenario:

1. Slow scanner gets created on a range at time t
2. Some deletes occur on the range at t+n
3. Major compaction happens on range (which purges delete records), old CellStores are destoryed
4. Scanner throws an exception and re-opens the CellStores.
5. Scanner no longer sees the records that were deleted at time t+n

This approach wouldn't allow us to provide snapshot isolation of scanners, which will be needed when we implement the transaction engine.

Luke

unread,
May 16, 2008, 6:23:50 PM5/16/08
to Hypertable Development
Another approach would be use a two phase update of Files:<ag>
column:

1. After the compaction, we'll update Files:<ag>:<ts>. New scanners
created will use the value of this entry.
2. Cellstores destructor can notify an update listener so that when
there're no outstanding scanners, it can update Files:<ag> and
cleaning up any Files:<ag>:<ts> entries.

Garbage collection doesn't have to be modified this way :)

Doug Judd

unread,
May 16, 2008, 6:45:33 PM5/16/08
to hyperta...@googlegroups.com
Ok, this is close, but here's a solution that's simpler from a RangeServer implementation standpoint.  It involves a slight change to the garbage collector :) but should be just a one-liner.

The Files:<ag> column will contain the list of CellStores, but will also contain the list of old CellStores that were in use at the time the compaction finished.  These CellStores will be commented out by prefixing their name with the '#' character.

AccessGroup maintains a mapping of CellStore filename -> reference count which is updated whenever a scanner is created or deleted.  Whenever the reference count for an old CellStore in this map drops to zero, the Files:<ag> column will get updated to reflect this.

The only change in the garbage collector is to ignore leading '#' character.  How does that sound?

- Doug

Doug Judd

unread,
May 16, 2008, 7:08:25 PM5/16/08
to hyperta...@googlegroups.com
Actually, maybe it's not so hard to implement your suggestion.  The AccessGroup can maintain a map of <ts> -> refcount.  The <ts> refers to compaction time.  When a scanner gets created, it can record the latest <ts> and update its refcount.  In the destructor it decrements the <ts> refcount and can delete the Files:<ag>:<ts> column if the refcount drops to zero.

The additional complexity happens in the case where a RangeServer gets killed with outstanding scanners.  When it comes up it will have to scan the Files: column and delete any existing Files:<ag>:<ts> columns.  It's a little more work on my end, but if it makes your life easier, I'm ok with it.  :)

On Fri, May 16, 2008 at 3:23 PM, Luke <vic...@gmail.com> wrote:

Luke

unread,
May 16, 2008, 7:33:45 PM5/16/08
to Hypertable Development
Yeah, this is actually a cleaner and more solid approach that allows
multiple compactions to happen with very long outstanding scans.
Otherwise, you'll have to parse Files:<ag> entries to ignore #
prefixed files yourself anyway :)

On May 16, 4:08 pm, "Doug Judd" <d...@zvents.com> wrote:
> Actually, maybe it's not so hard to implement your suggestion. The
> AccessGroup can maintain a map of <ts> -> refcount. The <ts> refers to
> compaction time. When a scanner gets created, it can record the latest <ts>
> and update its refcount. In the destructor it decrements the <ts> refcount
> and can delete the Files:<ag>:<ts> column if the refcount drops to zero.
>
> The additional complexity happens in the case where a RangeServer gets
> killed with outstanding scanners. When it comes up it will have to scan the
> Files: column and delete any existing Files:<ag>:<ts> columns.

You probably want to promote the latest Files:<ag>:<ts> to Files:<ag>
in this case.

> a little more work on my end, but if it makes your life easier, I'm ok with
> it. :)

Not only it makes my life easier, it's a more robust design as well. A
win-win situation :)

Doug Judd

unread,
May 16, 2008, 7:45:53 PM5/16/08
to hyperta...@googlegroups.com
Maybe I'm misunderstanding your solution.  Are the files in the Files:<ag>:<ts> column the files that are "locked up" but should be GC'd because there are scanners outstanding on them?  Or are they the files that are live as of the compaction with timestamp <ts>?

- Doug

Luke

unread,
May 16, 2008, 7:59:56 PM5/16/08
to Hypertable Development
The latter: they're files live as of <ts>. GC only looks at the
Files:<ag> column. And Range*Client would always look for latest
Files:<ag>:<ts> if they exist.

Doug Judd

unread,
May 16, 2008, 8:07:58 PM5/16/08
to hyperta...@googlegroups.com
Ok, I'm a lot less enthusiastic about this solution.  It adds a lot of extra complexity in the RangeServer.  Instead of just reading the Files:<ag> to figure out which files are live for the <ag>, it has to read all of the columns to figure out the latest one.  Plus, when the Range gets loaded it will have to purge all of the Files:<ag>:<ts> except for the latest one.  It is a lot easier to just read the Files:<ag> column and skip the files that have been commented out.

Luke

unread,
May 16, 2008, 8:20:19 PM5/16/08
to Hypertable Development
OK, having to check both Files:<ag> and Files:<ag>:<ts> might be
troublesome. File:<ag> entries with current and commented outstanding
files is actually not too bad to deal with in GC. So either way is
fine with me, if it makes your life easier :)

Doug Judd

unread,
May 16, 2008, 8:37:31 PM5/16/08
to hyperta...@googlegroups.com
It does make my life easier.  Less coding.  :)  Thanks!
Reply all
Reply to author
Forward
0 new messages