Hi Chris,
On Sat, Jan 23, 2016 at 10:57:31AM -0800, Chris Holland wrote:
> I watched the video. She seems to indicate the "client" is a proxy server
> for the user.
Yes, basically the same as spanner, the assumption is the client is
essentially the webserver. So it's fairly safe to assume it has broad
connectivity and doesn't have massive sudden spikes of growth either.
> So the connection issue would at least be relegated to your
> cluster. There were a couple of things from the video I'd question more. In
> the IR discussion they seem to say if there are multiple answers from
> servers, the "client" (proxy client) re-submits with one of the values.
> Which would be bad in the consistency sense from the original request. Its
> possible this language was just too high level to what they were trying to
> convey. I'd expect that to abort back to the user.
Nah, that's fine. I've read the paper too now and had some email contact
with Irene. GoshawkDB is the same: different nodes which have copies of
the same object can return different answers - for example if the
"question" is "I read x at version 3, is that ok?", one node with x could
say "yup, fine", whilst another could say "no, you're out of date, x is
now at version 4; here is version 4". This is inherent in a system where
you don't have a global total order. At some point you have to combine
all these answers and create one "final" answer. In this case, what
GoshawkDB would do is get back to the client and say "you need to
restart your client transaction but now with x at version 4". So that
reruns, and then the new outcome gets submitted again. GoshawkDB does
this combination in the acceptors, whereas TAPIR pushes it out to the
clients themselves (figure 9 in the paper). So there's much more logic
in the client, but there are also some performance benefits to doing
that.
This is not really much different from any database: even with postgres
you have the possibility of "concurrent modification exceptions" to
which the only sane response is to rerun the transaction. This is why
all transaction functions should always be side-effect free until after
the transaction has committed. In a lot of modern databases, the db
itself will automatically rerun the transaction function, whereas in
older designs that's often not done and it's up to the user to deal with
that sort of thing (and most likely also implement random binary
exponential backoff).
> They also said something
> about timestamps, which would imply some kind of sync across the servers,
> they said there was a fallback/resubmit path if the timestamp was out of
> whack. Using the timestamps for anything meaningful across machines implies
> you have another problem of keeping the servers in sync. Maybe they mean
> the timestamps just have to be sane as a reference from the previous
> timestamp on the same node.
So they use timestamps where GoshawkDB uses vector clocks. The GoshawkDB
approach is more complex, but strictly speaking would allow more
transactions to commit in more circumstances than with TAPIR. How much
that actually matters in practise I've no idea just yet. I find it very
odd that the benchmarks they did for TAPIR were against
non-transactional stores, and that they implemented a k-v store for
TAPIR - just seems odd to me to apparently not want to take advantage of
the transactional support.
I believe the basic idea is that a transaction can only be "prepared" by
an object if the txn clock is greater than the committed clock of the
object (this is figure 8 in the paper). This is almost exactly the same
logic as in GoshawkDB, it's just that is GoshawkDB rather than there
being one clock for the whole transaction, there's one clock for each
object mentioned within the transaction, which just allows a slightly
more fine-grained modelling of dependencies between transactions. Also
in GoshawkDB, the clocks come from the object-copies instead of the
client, and they're entirely logical clocks and have nothing to do with
actual time.
In TAPIR, if the client clocks are badly out then you'll suffer
starvation: only the client with the biggest clock will be able to get
transactions committed, though I think they do also build in some
ability to resync: if you get your txn rejected because your clock is
too small, the rejection will contain with in the current max clock and
so you can bump yours forwards to at least that (which in some ways has
a fair amount in common with phase 1 of Paxos, which in itself is pretty
interesting).
Hopefully these ramblings make some sort of sense!
Matthew