A cursor is an opaque deletion-tolerant index into a Btree keyed by source userid and modification time. It brings you to a point in time in the reverse chron sorted list. So, since you can't change the past, other than erasing it, it's effectively stable. (Modifications bubble to the top.) But you have to deal with additions at the list head and also block shrinkage due to deletions, so your blocks begin to overlap quite a bit as the data ages. (If you cache cursors and read much later, you'll see the first few rows of cursor[n+1]'s block as duplicates of the last rows of cursor[n]'s block. The intersection cardinality is equal to the number of deletions in cursor[n]'s block). Still, there may be value in caching these cursors and then heuristically rebalancing them when the overlap proportion crosses some threshold.
-John Kalucki Infrastructure, Twitter Inc.
On Sat, Jan 16, 2010 at 10:40 PM, Marc Mims <marc...@gmail.com> wrote:
* John Kalucki <jo...@twitter.com> [091209 09:28]:
> A cursor should be valid forever, but as it ages and rows are removed, you
> might see some minor data loss and probably more duplicates.
Out of curiosity, what is a cursor? From our (the users') perspective,
it's just an opaque number. But I'm curious. How is it generated?
What does it represent internally?
-Marc
|