Hi folksJust read through http://research.google.com/archive/spanner.html "Google's scalable, multi-version, globally-distributed, and synchronously-replicated database"Would be interested in the group's view, especially regarding
- does this provide true ACID?
- does it scale to cloud scale?
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
or get it on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
On Saturday, September 29, 2012 2:19:58 AM UTC-7, Gilad wrote:
Hi folks
Just read through http://research.google.com/archive/spanner.html "Google's scalable, multi-version, globally-distributed, and synchronously-replicated database"
Would be interested in the group's view, especially regarding
- does this provide true ACID?
- does it scale to cloud scale?
"Yes" to cloud scale, as can be told from the first few paragraphs.
Durable - yes, it writes to durable memory, designed to be replicated across wide areas (including cross-continent). That's about as durable as anything, anywhere.
Isolated - this is essentially the same as serializable, which they claim at the top left of page 2 and discuss extensively.
Atomic - not explicitly stated, but their discussion of supporting transactions, which can succeed or abort, implies that it is atomic when you do a transaction.
Consistent - not clear to me that this is present. It's unclear what "consistent" means when the application can stuff what it wants wherever it wants in the tablets / Collosus file systems. It's not like an RDBMS where you specify table content and relationships, and have triggers, etc. But maybe I'm misreading what "consistent" necessarily means in a general RDBMS.
So, is this the mythical cloud-scale ACID RDBMS? My take is "not exactly," but that's also not what they set out to implement.
Yes, it looks like the key design or the directory design is crucial to get good performance out of it OR extract ACID-ness out of it.
One thought occurred to me was that a GUID+metadata [say userId ] for that system would have to encapsulate (almost)every other ‘sub-directory’. Then are we looking at a single giant directory tree schema, which represents the entire system? Because conversely, if you partition it, you are going to lose consistency.
Any thoughts?
Abhishek
Greg, thanks for the insight and analysis.
I agree consistency is the central issue.
My reading is that they replace "long latency" instead of "eventual consistency". However, like you, I do not really understand what consistency means in this context and was hoping for help from the group. I am particularly confused by what happens if there is a high rate of updates - lets assume it is so high that the time between updates is shorter than the latency of updating the global system. What does consistency mean in that situation?
Yes, it looks like the key design or the directory design is crucial to get good performance out of it OR extract ACID-ness out of it.
One thought occurred to me was that a GUID+metadata [say userId ] for that system would have to encapsulate (almost)every other �sub-directory�. Then are we looking at a single giant directory tree schema, which represents the entire system? Because conversely, if you partition it, you are going to lose consistency.
Any thoughts?
�
Abhishek
On Thu, Oct 4, 2012 at 1:49 PM, Jim Starkey <jsta...@nuodb.com> wrote:
Sorry, but I have to quibble a bit.
On 10/1/2012 7:39 PM, Greg Pfister wrote:
On Saturday, September 29, 2012 2:19:58 AM UTC-7, Gilad wrote:
Hi folks
Just read through�http://research.google.com/archive/spanner.html� "Google's scalable, multi-version, globally-distributed, and synchronously-replicated database"
Would be interested in the group's view, especially regarding
- does this provide true ACID?
- does it scale to cloud scale?
"Yes" to cloud scale, as can be told from the first few paragraphs.
Yes to cloud scale if you design the application and database to fit their model.� They are quite unusual in that their data definition language allows specification of logical storage affinity.� This means, for example, that a photo album record and its photographs will be stored together, so a join of a single album and its photographs will be quite fast.� Without this feature -- and correct usage -- the photographs could be scattered over a thousand computers around the world.� In other words, you have to be very careful what you ask for.
The Google applications that use Spanner are essentially single user applications in the sense that only one user is authorized to do updates.� This, in conjunction with placement control, allows a high volume of unrelated updates.
Under a contentious load, however, Spanner performance can be expected to drop off dramatically when the Paxos consensus protocols kick in (think two phase commit on steriods).
So I guess a fair answer would be yes if your problem matches their solution, otherwise probably not.
Close enough in my book.Yup.
Durable - yes, it writes to durable memory, designed to be replicated across wide areas (including cross-continent). That's about as durable as anything, anywhere.
Isolated - this is essentially the same as serializable, which they claim at the top left of page 2 and discuss extensively.
Not at all the same as serializable.� Serializability required for ACID, just consistency.
Isolated is a necessary precondition for consistent, however, so there's no getting around it.
Atomic - not explicitly stated, but their discussion of supporting transactions, which can succeed or abort, implies that it is atomic when you do a transaction.
Their use of the timestamps and the� Paxos protocol (http://en.wikipedia.org/wiki/Paxos_(computer_science)
guarantees atomicity.
Consistent - not clear to me that this is present. It's unclear what "consistent" means when the application can stuff what it wants wherever it wants in the tablets / Collosus file systems. It's not like an RDBMS where you specify table content and relationships, and have triggers, etc. But maybe I'm misreading what "consistent" necessarily means in a general RDBMS.
Consistent means that a transaction starts and end with a consistent state.� Details include enforcing uniqueness of keys, referential integrity, etc.� I don't believe the Spanner has anything but (unique) primary keys, which makes this simpler.
The mythical cloud-scale ACID RDBMS is approaching GA in Cambridge, Massachusetts.
So, is this the mythical cloud-scale ACID RDBMS? My take is "not exactly," but that's also not what they set out to implement.
Greg Pfister--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
�
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
�
�
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
�
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
�
�
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
�
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
�
�
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
i Sent from my iPad with iMstakes
On Oct 4, 2012, at 22:07, "Jim Starkey" <jsta...@nuodb.com> wrote:
On 10/4/2012 5:29 PM, Abhishek Pamecha wrote:
Yes, it looks like the key design or the directory design is crucial to get good performance out of it OR extract ACID-ness out of it.
No tradeoff is necessary.
I would disagree by directly quoting from the paper:
---Begin quote section 2.3
This interleaving of tables to form directories is significant because it allows clients to describe the locality relation- ships that exist between multiple tables, which is nec- essary for good performance in a sharded, distributed database. Without it, Spanner would not know the most important locality relationships.
--end quote
One thought occurred to me was that a GUID+metadata [say userId ] for that system would have to encapsulate (almost)every other ‘sub-directory’. Then are we looking at a single giant directory tree schema, which represents the entire system? Because conversely, if you partition it, you are going to lose consistency.
Partitioning doesn't sacrifice consistency. Performance, generality, sanity, simplicity, sure, but consistency can be preserved with a two phase commit and/or Paxos. There are more application-gentle alternatives, but that's a different question.
Going by Brewer's CAP conjecture, if we want to maintain the same availability(which i think would be the case), partitioning will sacrifice consistency.
Also, as per the example in the paper( again section 2.3) it seems to occur that USERS table would be parent to all the other tables that you may need for your system. Conversely said,all other tables would need to interleaved in the USERS table. Or else how do you store other data related to user : for ex: videos, comments, address book etc.. with locality of data in mind.
Any thoughts?
Abhishek
No tradeoff is necessary.On 10/4/2012 5:29 PM, Abhishek Pamecha wrote:
Yes, it looks like the key design or the directory design is crucial to get good performance out of it OR extract ACID-ness out of it.
Partitioning doesn't sacrifice consistency. Performance, generality, sanity, simplicity, sure, but consistency can be preserved with a two phase commit and/or Paxos. There are more application-gentle alternatives, but that's a different question.
One thought occurred to me was that a GUID+metadata [say userId ] for that system would have to encapsulate (almost)every other ‘sub-directory’. Then are we looking at a single giant directory tree schema, which represents the entire system? Because conversely, if you partition it, you are going to lose consistency.
On Thu, Oct 4, 2012 at 1:49 PM, Jim Starkey <jsta...@nuodb.com> wrote:
Sorry, but I have to quibble a bit.
On 10/1/2012 7:39 PM, Greg Pfister wrote:
On Saturday, September 29, 2012 2:19:58 AM UTC-7, Gilad wrote:
Hi folks
Just read through http://research.google.com/archive/spanner.html "Google's scalable, multi-version, globally-distributed, and synchronously-replicated database"
Would be interested in the group's view, especially regarding
- does this provide true ACID?
- does it scale to cloud scale?
"Yes" to cloud scale, as can be told from the first few paragraphs.
Yes to cloud scale if you design the application and database to fit their model. They are quite unusual in that their data definition language allows specification of logical storage affinity. This means, for example, that a photo album record and its photographs will be stored together, so a join of a single album and its photographs will be quite fast. Without this feature -- and correct usage -- the photographs could be scattered over a thousand computers around the world. In other words, you have to be very careful what you ask for.
The Google applications that use Spanner are essentially single user applications in the sense that only one user is authorized to do updates. This, in conjunction with placement control, allows a high volume of unrelated updates.
Under a contentious load, however, Spanner performance can be expected to drop off dramatically when the Paxos consensus protocols kick in (think two phase commit on steriods).
So I guess a fair answer would be yes if your problem matches their solution, otherwise probably not.
Close enough in my book.Yup.
Durable - yes, it writes to durable memory, designed to be replicated across wide areas (including cross-continent). That's about as durable as anything, anywhere.
Isolated - this is essentially the same as serializable, which they claim at the top left of page 2 and discuss extensively.
Not at all the same as serializable. Serializability required for ACID, just consistency.
Isolated is a necessary precondition for consistent, however, so there's no getting around it.
Atomic - not explicitly stated, but their discussion of supporting transactions, which can succeed or abort, implies that it is atomic when you do a transaction.
Their use of the timestamps and the Paxos protocol (http://en.wikipedia.org/wiki/Paxos_(computer_science)
guarantees atomicity.
Consistent - not clear to me that this is present. It's unclear what "consistent" means when the application can stuff what it wants wherever it wants in the tablets / Collosus file systems. It's not like an RDBMS where you specify table content and relationships, and have triggers, etc. But maybe I'm misreading what "consistent" necessarily means in a general RDBMS.
Consistent means that a transaction starts and end with a consistent state. Details include enforcing uniqueness of keys, referential integrity, etc. I don't believe the Spanner has anything but (unique) primary keys, which makes this simpler.
The mythical cloud-scale ACID RDBMS is approaching GA in Cambridge, Massachusetts.
So, is this the mythical cloud-scale ACID RDBMS? My take is "not exactly," but that's also not what they set out to implement.
Greg Pfister--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
---
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com.
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cloud-computing?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
-- ~~~~~
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group @khazret_sapenov @cloudslam @up_con
Download hundreds of recorded cloud sessions at
- http://cloudslam.org/register
- http://up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration