Hi,
I am sorry but I often see silliness raising to new highs when talking about the NoSQL theme. As usual, I think that the radicalisms around NoSQL are just BS and the good stuff is the Eventually Consistent / BASE stuff.
Talking about being "liberating" to have no SQL sounds like double BS to me, since I experienced plenty of that: I had plenty experience with "NoSQL" solutions and felt quite liberated when I finally had the chance to use SQL and relational databases.
Ricky Ho's reply is quite on the mark and I just want to add some "historic" perspective.
The problem, you see, is that all this NoSQL crap is just history repeating and BASE, as Ricky stated, is where all the meat is.
All the innovation that matters revolves around the BASE way of working.
"Moving away" from relational and SQL "just because" is silly, or just criminal, bordering the "book burning" episode. And "Those who cannot remember the past are condemned to repeat it." (
http://wiki.answers.com/Q/Who_said_Those_who_ignore_history_are_bound_to_repeat_it)
At Amazon they did forego using SQL and relational because, with the current state of technology, they just HAD TO in order to have BASE instead of ACID.
If you think we always had SQL and relational databases around, you must be a newbie (or worse). Only a rookie - and not a very smart one at that - would skip getting the historic facts straight before talking this loud about such an issue.
Or are you trying to sell us "dbm" as a new and revolutionary technology?
If we only go back to 30 years ago, only very few had access to computers.
Some 22 years ago, IBM PC clone use exploded, democratizing computer use but NOT democratizing SQL and relational database use. Many of us used tools like Faircom's C-Tree or Borland's Turbo Database Toolbox which were storage / B-Tree APIs much like the C based Berkley DB was in the beginning (not what is becoming at Oracle).
Which means: many of the older guys know very well what living without relational databases and without SQL feels like.
Do you think Key/Value stores are a new idea? Hey, look at dbm, from 1979 (and its many successors):
http://en.wikipedia.org/wiki/Dbm
I remember reading about Coherence (then Tangosol's, now Oracle's) - which (arguably) is still the most sophisticated key/value store around - in 2002 and it was already version 1.3.
Some times I even read posts/articles talking about hierarchical databases as if it was something new, but the first popular implementation I know of dates from the sixties. Look, its IBM's IMS, which worked for the first time in 1968:
http://en.wikipedia.org/wiki/Information_Management_System
(I confess I didn't know it was THIS old, but I remember IMS was already there when I started.)
We had plenty of NoSQL before relational DBs became popular. Can you imagine how much of your daily life depends on Cobol written code??? Do some of you think COBOL used to have SQL? Think again:
http://en.wikipedia.org/wiki/IDMS
And do you think Object Database technology (from which graph databases are just a variant) are anything new? The first ODBMS systems were created before OOP became really popular, in the eighties:
http://www.odbms.org/Introduction/history.aspx
Graph databases are just a variation of Object Databases (which were also previously already adapted to work with XML nodes...).
...and there are plenty of systems around:
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems
Did you notice the "SQL support" column in that table?
Did you notice how it is filled most of the time?
Do you think it was always this way? They HAD to put SQL there!!!
Object Databases had plenty of advocacy too, and plenty of people pointed ODBMS technology as the best invention since sliced bread and announced that it was next big thing again and again... and it did never happened. It was the kind of discussion that sold magazines (yeah, before Internet).
ODBMS are very useful on plenty of domains. Plenty enough to have some ODBMS products apparently doing quite well. Look at the size (the millions) of Caché's producer:
http://www.intersystems.com/aboutus/index.html
...but also look at how they sell it:
"a high-performance object database that runs SQL five times faster than relational databases."
...text from here:
http://www.intersystems.com/cache/index.html
You have to do commercial development to get why relational is important. The "relational feature" + SQL helps you to fit in relations as an afterthought. You just need to add an index of two, use a new SQL query and there you go.
The thing is that many relations are just an afterthought. Human brains are too limited to imagine, in a single step, all that a complex system will do and will be.
We have to build software incrementally (even when we plan to do it in a single step) and relational makes some of these increments easier. (And it is better to do the right amount of smaller increments before attempting to redesign / refactor.)
SQL is also a "fix it" tool for developers and DBAs. A very important one too.
So, WHAT is new (that matters) in the NoSQL movement?
- BASE
BASE is all that matters in this whole new world. It shouldn't be called the NoSQL movement, it should be called the BASE movement.
And, by the way, that is also not so new either. BASE first reference I know of is from a 1997's paper whose authors also include Eric A. Brewer and Armando Fox, of CAP Theorem's fame. You can find a summary of the whole BASE / CAP history with references to the interesting papers here:
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
... including the mentioned 1997 presentation of the BASE concept at the "Cluster-Based Scalable Network Services" paper from:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2034&rep=rep1&type=pdf
... the presentation of the CAP principle from the "Harvest, yield, and scalable tolerant systems" by
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&rep=rep1&type=pdf
... and the more popular presentation of CAP and BASE at the 2000 PODC keynote:
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
It still took a few years (not that many, considering other technologies history) to have the truly industrial proof that the concept worked, thanks to Amazon's Dynamo:
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
And this last URL is the real must read to understand BASE, especially so because it quotes real numbers about what you can achieve.
Every "eventually consistent" database is just trying to copy Dynamo, with a variation here and another there.
Even Ricky Ho's excellent and well illustrated posts are NOT a replacement for the reading of the Dynamo's paper - although they may help on its interpretation.
BASE is the REAL innovation.
All the rest just sounds like history repeating.
And if you talk with some of the distributed computing veterans, they will start pointing how so many of the technologies used by Dynamo are old news too. Go and check the age for Vector Clocks (1988) (and the Lamport timestamps predecessor, 1978), Merkle Trees (1979 patent) or the Gossip protocols (there is a first reference of the idea at Wikipedia which reads "Epidemic algorithms for replicated database management. Alan Demers, et al. Proc. 6th ACM PODC, Vancouver BC, 1987.").
Maybe there is even some "Dynamo" like database I do not know of.
Anyway, Amazon proved it works like no Inktomi project did: It was not only about visibility but also about SCALE.
And its target was not some bigger than life achievement like indexing the Internet but something much more prosaic: e- Commerce!!!
And what about SQL?
I don't think we need less than SQL and relational. Actually, I think we need much more.
We need:
- a query engine that knows how to calculate the cost of distributed joins, network data transfers (under current network conditions, whatever that means at the moment and place) and other distributed database specific operations;
- a database engine that can optimize the database trough re-partitioning and de-normalization of data according to production needs;
- a new SQL language that covers the different topologies a distributed DB can have and "BASE aware".
I already found plenty evidence of research projects dealing with just that kind of innovation.
But this is too complex to achieve up in a short time. Most of the "ACID" SQL engines we know are still leaky abstractions, even without all this new complexity. The best we can do for distributed databases in a short period of time are the NoSQL solutions we know of.
And please notice that even in the NoSQL world, some query solutions are starting to look familiar, like with Hadoop's Pig Latin:
http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html
CONCLUSION:
IMO "NoSQL" is only a liberation for newbies too lazy to learn SQL or developers too limited to get its power.
The awkwardness of using SQL based database interfaces with programming languages - and related ORM crap - is a interface issue that should already have been solved ages ago... but we know how the embedded SQL solutions always failed and how most ORM APIs are crappy. (Maybe there is hope in LINQ like techniques...)
BASE is the real liberation because it allows us to achieve performance / availability targets we can not dream off with the standard ACID technologies.
(Unfortunately there is no query language for BASE databases as good as SQL is for relational.)
All the rest is hype and history repeating while, unfortunately, past teachings are ignored.
Have fun,
Paulo Gaspar
http://paulogaspar7.blogspot.com/
@paulogaspar7