http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
and a purported "proof" at
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf
The CAP conjecture, I am convinced, is false and can be proven false.
The counterexample to the conjecture and the flaw in the "proof" is that
each assumes that a system needs to be serializable to be consistent.
Serializability is a sufficient condition for consistency, but is not a
necessary condition. A counterexample of a system that demonstrates CAP
false has the following characteristics:
1. The system is transactional, i.e. all updates within a transaction
are tentative until committed atomically.
2. Inconsistency can be detected during execution of a transaction so
that a transaction that would otherwise introduce inconsistency is
prevented from committing.
3. Commits are executed in the context of a partition tolerant commit
algorithm (a transaction is not committed until it has been
reported as committable to all potential surviving partitions).
The only serious question is whether a system can be both
non-serializable and consistent, and it isn't much of question given a
formal definition of consistency (concurrent transactions can't
overwrite each other, unique constraints must appear unique, etc.). It
is easier, cheaper, and bullet proof to serialize updates to specific
elements within a transaction than to try to serialize transactions
themselves. For example, every database record might have a
deterministic resolution agent that actively prevents concurrent update
in an multi-version concurrency control (MVCC) database.
The quintessential problem is that academic computer science has been
conflating the concepts of consistent and serializable for so long that
it has forgotten that they are not the same thing. If you substitute
"serializable" for "consistent", the CAP conjecture becomes the SAP
conjecture, and is probably true. So the trick is to be CAP without
being a SAP.
The CAP conjecture has been a theoretical millstone around the neck of
all ACID systems. Good riddance.
This is the first wooden stake for the heart of the noSQL movement.
There are more coming.
--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376
I had thought the whole point of the CAP theorem was to describe
fundamental tradeoffs. No one that truly understands the implications
of that theorem would argue that relaxing C would enable improves
levels of both A and P. This is what I tried to explain in my
"Designing for the Cloud" slides: http://www.slideshare.net/StuC/oopsla-cloud-workshop-designing-for-the-cloud-elastra
In other words, it's not that the conjecture is false, it's that
people need to check their assumptions in applying it. It's not a
black and white scenario where consistency kills you, full stop.
Where I agree with you is the NoSql crowd's tendency, like with all
fads, to over generalize, and to ignore evidence in front of their
noses. "Eventual consistency" ala Dynamo is only one way to skin the
scalability cat.
Stu
Sent from my iPhone
> --
> ~~~~~
> Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
> Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
> Follow us on Twitter http://twitter.com/cloudcomp_group or
> @cloudcomp_group
> Post Job/Resume at http://cloudjobs.net
> Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC
> , http://www.amazon.com/gp/product/B002H0IW1U or get instant access
> to downloadable versions at http://cloudslam09.com/content/registration-5.html
>
> ~~~~~
> You received this message because you are subscribed to the Google
> Groups "Cloud Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
One question/comment:
Isn't it the case that your counterexample actually is serializable,
just not locally (with transaction) serializable? Instead, it's
serializable at the transaction level.
Greg Pfister
http://perilsofparallel.blogspot.com/
On Feb 25, 12:38 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
> Brewer's CAP conjecture is that a networked system cannot be consistent,
> available, and partition tolerant. A long description can be found at
>
> http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
>
> and a purported "proof" at
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=...
a) NOSQL != !(SQL)
b) and so, !(NOSQL) != SQL
c) The unfortunate NOSQL term doesn't mean NoSQL (notice the capitalization). IMHO, what it says is that there are scenarios and characteristics which require less strict ACID and various flavors (ket-value store, document store, data srtucture store et al)
d) Hence, saying CAP theorem is false does not mean end of NOSQL debate
e) Granted, there is lots of rhetoric going on, but that is hardly a basis for sound theory or for that matter practice. For example a blog [http://blogs.forrester.com/appdev/2010/02/nosql.html] talks about NoSQL. Have no clue why folks write about things without researching through and that is a problem. In fact the blogger is going to write a report - am not sure if it is a threat or a promise ;o)
f) From my perspective the NOSQL has value in terms of scalability, internet scale distributability, resilience (the requirement of which is a by product of the internet infrastructure) ... I am not sure one would run Hadoop over Oracle (in it's current form. I am aware that database vendors are adding the mapReduce capability)
g) There is the NOSQL crowd and the NoSQL crowd, which of course, are not the same ...
Cheers
<k/>
> --
> ~~~~~
> Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
> Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
> Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
> Post Job/Resume at http://cloudjobs.net
> Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
Interesting and useful, Jim. I've always been leery of CAP, myself. One question/comment: Isn't it the case that your counterexample actually is serializable, just not locally (with transaction) serializable? Instead, it's serializable at the transaction level.
Jim, Interesting ... Some points: a) NOSQL != !(SQL) b) and so, !(NOSQL) != SQL c) The unfortunate NOSQL term doesn't mean NoSQL (notice the capitalization). IMHO, what it says is that there are scenarios and characteristics which require less strict ACID and various flavors (ket-value store, document store, data srtucture store et al) d) Hence, saying CAP theorem is false does not mean end of NOSQL debate e) Granted, there is lots of rhetoric going on, but that is hardly a basis for sound theory or for that matter practice. For example a blog [http://blogs.forrester.com/appdev/2010/02/nosql.html] talks about NoSQL. Have no clue why folks write about things without researching through and that is a problem. In fact the blogger is going to write a report - am not sure if it is a threat or a promise ;o) f) From my perspective the NOSQL has value in terms of scalability, internet scale distributability, resilience (the requirement of which is a by product of the internet infrastructure) ... I am not sure one would run Hadoop over Oracle (in it's current form. I am aware that database vendors are adding the mapReduce capability) g) There is the NOSQL crowd and the NoSQL crowd, which of course, are not the same ...
My slightly deeper point is that "database theory" is based as much on
folk lore than actual provable results. The consequences of this is
that progress in computing -- particularly in cloud computing, which has
these issues in spades -- has been unnecessarily retarded by shallow
thinking. The case in point is database scalability (insert shameless
plug for talk here). If you conflate consistency with serializability
you can prove that a single database can't be run on multiple computers
with those computers executing in lock step. If you are willing to
knock over a sacred cow, it is apparent that a database can scale
elastically to meet an arbitrary demand and can be geographically
disperse, each of which the CAP conjecture ('tain't a theorem no more)
says can't be done.
This is the point in history where database systems leave the rails and
start to fly.
>
> In other words, it's not that the conjecture is false, it's that
> people need to check their assumptions in applying it. It's not a
> black and white scenario where consistency kills you, full stop.
It is black and white. The CAP conjecture is, in fact, false. The SAP
conjecture is a different story, but who should care about
serializability if consistency can be achieved other ways? If you don't
have transactions that can be rolled back, I'll concede some validity
for the CAP conjecture, but then in the absence of transactions it's
next to impossible to achieve consistency anyway. But on it's face,
the CAP conjecture is just wrong.
I don't see serialization as a required implementation of
transactional consistency.
If P(artitioning) means loss of communication between transaction
participants, the transaction can only be rolled back and a new one
cannot be started. If you can run transactions without requiring the
partitions to be consistent (i.e. others to know you are having a
transaction), you aren't partitioned.
I think MVCC or not, all parts of the system need to agree
(=Consistency) on the values of the attributes being updated, before
commit ends.
So starting new transactions can work. But how do you commit without
propagating the knowledge about this among interested but partitioned
parties?
Sassa
On Feb 25, 7:38 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
> Brewer's CAP conjecture is that a networked system cannot be consistent,
> available, and partition tolerant. A long description can be found at
>
> http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
>
> and a purported "proof" at
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=...
I guess, it depends on how you define consistency.
In my opinion, if C in CAP stands for Strong Consistency,
then the CAP theorem is correct. However, for relaxed
forms of consistency, e.g., eventual consistency,
http://en.wikipedia.org/wiki/Eventual_consistency
all three properties could be true at the same time.
Gabriel
Nice to see you're finally reading Lynch, Jim, only a month or two after
you clearly had no idea who she was or what the formal underpinnings of
CAP were. Unfortunately, you're a little premature in declaring victory
here. Refuting or limiting a proof is not the same as proving its
conclusion is false. If I say that 2+2=4 on Mondays, and you point out
that my proof is inapplicable six days of the week, you still haven't
proven that 2+2 equals anything but four on Tuesday through Sunday.
Similarly, even if Gilbert and Lynch's proof applies only to
serializable consistency, that does not preclude a separate and more
general proof of CAP. Most branches of mathematics are full of examples
where a theorem was proven for successively more general cases before it
was proven in its entirety.
Turning now to the substance of your supposed refutation, it seems not
to address the issue of when a transaction terminates - and it must
terminate in all cases to satisfy the Gilbert/Lynch definition of
availability (section 2.2). As soon as you need to ensure that
transactions terminate, you need to have a mechanism to coordinate
rollback of one or more transactions that might span multiple of your
"deterministic resolution agents" even in the presence of arbitrary
partitions. I'm sure many in the computer science community eagerly
await your explanation of how to solve that particular problem.
That's only the first problem that I noticed. I'll be glad to point out
more if you'd like. Someone might still manage to prove that there is
some useful set of definitions for which CAP is provably false, but it's
going to take an awful lot more than an "MVCC is magic" hand-wave. Be
careful you don't replace the C in CAP with Completely Ridiculous. ;)
Stuart Charlton wrote:My point is that there does not need to be a tradeoff at all. It is perfectly possible to construct a system that is consistent, available, and partition tolerant. It just can't be serializable.
Jim,
I had thought the whole point of the CAP theorem was to describe fundamental tradeoffs. No one that truly understands the implications of that theorem would argue that relaxing C would enable improves levels of both A and P. This is what I tried to explain in my "Designing for the Cloud" slides: http://www.slideshare.net/StuC/oopsla-cloud-workshop-designing-for-the-cloud-elastra
My slightly deeper point is that "database theory" is based as much on folk lore than actual provable results. The consequences of this is that progress in computing -- particularly in cloud computing, which has these issues in spades -- has been unnecessarily retarded by shallow thinking. The case in point is database scalability (insert shameless plug for talk here). If you conflate consistency with serializability you can prove that a single database can't be run on multiple computers with those computers executing in lock step. If you are willing
to knock over a sacred cow, it is apparent that a database can scale elastically to meet an arbitrary demand and can be geographically disperse, each of which the CAP conjecture ('tain't a theorem no more) says can't be done.
Interesting ! Can you define "consistency" ?
In my opinion, the whole objective of "consistency" is to make reasoning the application logic easy within a concurrent environment. Imagine our DB is a state machine and each user action is a trigger of a state transition. Without such isolation, the possible combination of user action (from different users) is a combinatorial explosion which make it almost impossible to reason the application's behavior.
To simplify our analysis, we "flatten" the combinatorial explosion by disallowing user actions (within a logical unit of work) interlacing with each other. In other words, the logical unit of work for each user is as if it has exclusive access to all states. This is the fundamental of serializability.
So I agree with you. From a pure theoretical perspective, "serializability" is a means to achieve the goal of "consistency". But practically, I don't see any other better alternatives.
Think about the difference between programming in a multi-thread environment using LOCKs, and compare that with Software Transaction Memory, you come to the same picture.
Rgds,
Ricky
----- Original Message ----
From: Jim Starkey <jsta...@nimbusdb.com>
To: cloud-c...@googlegroups.com
Sent: Thu, February 25, 2010 11:38:13 AM
Subject: [ Cloud Computing ] Brewer's CAP Conjecture is False
Brewer's CAP conjecture is that a networked system cannot be consistent,
available, and partition tolerant. A long description can be found at
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
and a purported "proof" at
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=pdf
The CAP conjecture, I am convinced, is false and can be proven false.
The counterexample to the conjecture and the flaw in the "proof" is that
each assumes that a system needs to be serializable to be consistent.
Here's a proof that it isn't serializable:Interesting and useful, Jim. I've always been leery of CAP, myself.
One question/comment:
Isn't it the case that your counterexample actually is serializable,
just not locally (with transaction) serializable? Instead, it's
serializable at the transaction level.
Greg Pfister
http://perilsofparallel.blogspot.com/
On Feb 25, 12:38 pm, Jim Starkey <jstar...@nimbusdb.com> wrote:
Brewer's CAP conjecture is that a networked system cannot be consistent,
available, and partition tolerant. A long description can be found at
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
and a purported "proof" at
That's only the first problem that I noticed. I'll be glad to point out more if you'd like. Someone might still manage to prove that there is some useful set of definitions for which CAP is provably false, but it's going to take an awful lot more than an "MVCC is magic" hand-wave. Be careful you don't replace the C in CAP with Completely Ridiculous. ;)
In other words, you give up availability. Brewer made exactly this
point on slide 16 of his 2000 PODC keynote
(http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf) where
he introduced the CAP Theorem. Congratulations, you've invented CP.
I don't believe there is -- or can be -- a universal definition of
consistency. Consistency must be what you dclare it to be. In a
database context, consistency is defined by the set of constraint
declarations: A set of unique constraints, a set of referential
integrity constraints, constraints on domains, etc. Other than simple
rules like "a record can't update a version of a record it couldn't
see", consistency should be explicit.
>
> In my opinion, the whole objective of "consistency" is to make reasoning the application logic easy within a concurrent environment. Imagine our DB is a state machine and each user action is a trigger of a state transition. Without such isolation, the possible combination of user action (from different users) is a combinatorial explosion which make it almost impossible to reason the application's behavior.
>
It is easy to demonstrate that a serializable database has at every
point a single definitive state. A non-serializable database, even a
consistent one, may never have a single definitive state due to message
skew of commit messages. On each node, a transaction sees the set of
transactions that have been reported committed, but since commit
messages are delivered asynchronously, concurrent transactions on
different nodes will see different sets of committed transactions, hence
no definitive database state.
But given that there isn't a single definitive state, the model of state
machine doesn't quite apply.
> To simplify our analysis, we "flatten" the combinatorial explosion by disallowing user actions (within a logical unit of work) interlacing with each other. In other words, the logical unit of work for each user is as if it has exclusive access to all states. This is the fundamental of serializability.
>
> So I agree with you. From a pure theoretical perspective, "serializability" is a means to achieve the goal of "consistency". But practically, I don't see any other better alternatives.
>
MVCC allows a transaction to a stable view of a volatile database,
subject to the rule that a transaction can't modify a version of record
it didn't see. MVCC isn't serializable, but can be made to enforce any
arbitrary definition of consistency.
> f) From my perspective the NOSQL has value in terms of scalability, internet
> scale distributability, resilience (the requirement of which is a by product
> of the internet infrastructure) ... I am not sure one would run Hadoop over
> Oracle (in it's current form. I am aware that database vendors are adding the
> mapReduce capability)
Aren't we missing the other big motivations for NOSQL?: extreme low latency
and extreme low cost (both update/insert cost and the cost of performing
aggregations/ queries) at massive scale - which I think is orthogonal to
CAP.
If Jim can give me those things as well as solving the CAP problem - then
sign me up!
--
Murray
Can you explain why MVCC is not serializable ?
In MVCC, every transaction maintaining a private copy of all data it has read and update. At commit time, the transaction submit the versions of this read/write set and compare the version stored in the DB. Commit will be successful in case the version matches, otherwise the transaction fail.
Lets say there are two concurrent transactions T1 and T2 overlap in time, and lets say T1 commit first (of course it succeed). After that T2 commits, it will only succeed if its (Read Union Write Set) does not intersect with T1's Write Set.
If T2 succeed, then it is equivalent to T2 starts AFTER T1 commits, which is equivalent to some serial transaction order, either T1, T2 or T2, T1
If T2 fails, then T2 need to restart the transaction and read the latest update from T1, which is equivalent to T1, T2.
So it is equivalent to some serial order of transaction execution, why is MVCC not serializable ?
There's a (somewhat) insightful description of the serialization anomalies that can arise with MVCC here:
http://en.wikipedia.org/wiki/Snapshot_isolation
Which is contains a description similar to one of Jim's prior posts, and hints that there may be new developments on this front.
IThe classic paper on the topic is here, which offers more detail....
http://www.cs.umb.edu/~poneil/iso.pdf
Stu
> On 2/25/10 1:59 PM, "Krishna Sankar" <ksan...@gmail.com> wrote:
>
>> f) From my perspective the NOSQL has value in terms of scalability, internet
>> scale distributability, resilience (the requirement of which is a by product
>> of the internet infrastructure) ... I am not sure one would run Hadoop over
>> Oracle (in it's current form. I am aware that database vendors are adding the
>> mapReduce capability)
>
> Aren't we missing the other big motivations for NOSQL?: extreme low latency
> and extreme low cost (both update/insert cost and the cost of performing
> aggregations/ queries) at massive scale - which I think is orthogonal to
> CAP.
No, CAP is all about the tradeoffs involved with managing data at massive scale.
The use of SQL, however, is completely orthogonal to massive scalability. As is the use or avoidance of joins.
There's plenty of evidence, for example, that aggregations & queries, MPP data warehouses are scalable and fast, and Hadoop really doesn't carry a performance edge. On the other hand, that's ignoring the relative work & time to do ETL to the warehouse vs. just writing Map & Reduce functions on flat data sets in HDFS.
For extreme low latency, I could see an argument against SQL or joins, yes. But you'd have to define "extreme". Oracle might take 100-400ms do a write transaction, 50-250ms for a read, with full consistency, joins, etc. Cassandra might do 25ms for a read, maybe < 1ms for a write, but with relaxed consistency and key/value lookup. Which is better depends on your requirements.
That brings us to "Low Cost", where I would paraphrase Jamie Zawinski: "open source is only free if your time has no value". For a startup, it's a no-brainer to use lower cost databases. For an enterprise, I think it's a more complicated question.
It's unfortunate that the punditry has conflated all of this. It's completely possible to build an RDBMS for massive scale, it just would look differently, and the SQL would feel somewhat differently. It wouldn't be as "ACID" as the sense we think of today - there would need to be more settings on the isolation knob than we have today.
Cheers
Stu
Jim, Can you explain why MVCC is not serializable ? In MVCC, every transaction maintaining a private copy of all data it has read and update. At commit time, the transaction submit the versions of this read/write set and compare the version stored in the DB. Commit will be successful in case the version matches, otherwise the transaction fail.
Lets say there are two concurrent transactions T1 and T2 overlap in time, and lets say T1 commit first (of course it succeed). After that T2 commits, it will only succeed if its (Read Union Write Set) does not intersect with T1's Write Set. If T2 succeed, then it is equivalent to T2 starts AFTER T1 commits, which is equivalent to some serial transaction order, either T1, T2 or T2, T1 If T2 fails, then T2 need to restart the transaction and read the latest update from T1, which is equivalent to T1, T2. So it is equivalent to some serial order of transaction execution, why is MVCC not serializable ?
For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response.In my counter-example, a non-failing node in the surviving partition can respond with a success or failure, as appropriate, and a node in a non-surviving partition return failure. A key element of the partition resolution algorithm is that a node can determine whether it is a member of a surviving partition or not, and respond accordingly.
-- Jim Starkey NimbusDB, Inc. 978 526-1376