Innovation in databases

13 views
Skip to first unread message

Tim

unread,
Dec 8, 2009, 7:12:49 AM12/8/09
to NOSQL
Recently there has been some discussion about the meaning of the NoSQL
movement, or at least it's name.

For a group to thrive it should have a strong and clear idea of what
it's desires are. Breadth of applicability needs to be balanced with
togetherness of aims, in order to reach a variety of people, and
achieve something great.

The notion of "no sql" is strong, creating togetherness, and still
broad enough to leave room for many to act on it.

Some have taken opposition to the name thinking that it demeans the
concepts that are embodied in a sql database. I don't believe this is
truly why the term "no sql" rings so true to the issue. I believe that
this confusion has largely been a result of the early days of the
movement where it was only a small portion of the sql problem that
finally prompted action and speaking out, specifically, the issue of
scalability for massive loads, and the realization that moving away
from sql could provide almost unexpected solutions.

In reality, the issue at play which lets "no sql" ring so true to us
is that standardization often comes at the expense of innovation.

That, is why we have really moved away from sql, it is not for any
specific approach to scalability or data storage, but rather just the
ability to free ourselves from the standardized ideas encapsulated in
the standard query language.

I'm going to say it again, moving away from sql allows us to innovate.
I'm sure no one will have ill feeling towards the notion of
innovation, and standardization is almost the exact opposite, it is
the crystallization of previous innovation, so of course it would be
what we stand against.

We do not necessarily stand against any specific idea encapsulated by
the sql standardization, rather we just choose to open ourselves up to
investigating the elements of the system for the sake of making design
decisions which provide innovative solutions.

As more exploration into the options takes place we see different
approaches to achieving innovative solutions. Currently the movement
has been largely defined as "scalability, non-relational, schema-less,
base etc" which has had it's focus solely on scalability, and to worse
detriment, often closed itself off to specific means of creating
valuable solutions for scalability. The wikipedia page currently reads
"promotes non-relational data stores that do not need a fixed schema,
and that usually avoid join operations", we need to increase the
breadth of applicability to developers and give a clearer picture of
this.

The issue of scalability is only one issue in the quest that starts
with freeing ourselves from standardization. There are many unseen
innovations to take place by freeing ourselves from these standardized
tools, and I say "freeing" because that is really what it is, we are
currently trapped with the tools that we have spent so many years
putting our effort into, at the expense of having little to go to
other than those tools.

As an application developer, and a framework developer, I have felt
the impact of the sql ideas and interfaces on the ability to create
efficient *and* beautiful database application code, whether that be
for massive scale operation, or for embedded applications.

I believe we need to be truer to the name that we have felt fits the
situation, "-no sql- is about re-opening the gates to innovation in
databases" this is the definition we need to bind ourselves on, and
make clear to those we want to attract.

Creating scalable solutions is only a subset of the story, and one
which will also be fueled by other tools which eschew the standardized
concepts even if those tools have little regard for scalability.



George James

unread,
Dec 8, 2009, 9:30:32 AM12/8/09
to nosql-di...@googlegroups.com
Tim
That's a really nice positive spin to put on the term and correctly
identifies why this NoSQL movement is important.

SQL has indeed caused the database industry to stagnate, to the extent that
many people today are incapable of conceiving any other way of accessing
data except SQL.

Progress can only happen through innovation and SQL has stymied that for
over 20 years.

Regards
George

George James Software
42-44 High Street
Shepperton
TW17 9AU
United Kingdom
Registered in England number 2568792

+44-1932-252568
www.georgejames.com
--

You received this message because you are subscribed to the Google Groups
"NOSQL" group.
To post to this group, send email to nosql-di...@googlegroups.com.
To unsubscribe from this group, send email to
nosql-discussi...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/nosql-discussion?hl=en.

Asaf Ary

unread,
Dec 8, 2009, 10:09:59 AM12/8/09
to NOSQL
This is a great view off the movement Tim, and one with which I
completely agree.
There is a need for us to present a list of common use cases where
relational databases either fail or provides for some very hard and
inefficient work.

I was hoping the community will give rise to such issues, simply by
providing a bed for questions by frustrated users looking for
solutions to their SQL problems.
Perhaps it will take some time, I will try to provide a list of some
common use cases where NoSQL can be used in order to simplify data
storage and hopefully that will elicit some response from first time
users.

I know that there has been some discussions about the name and the
definition, I really like your definition of the NoSQL movement and
believe that it is truly about innovations in data storage.


Thank you for the great post
-- Asaf Ary

Ricky Ho

unread,
Dec 8, 2009, 10:35:20 AM12/8/09
to nosql-di...@googlegroups.com
I think the term "NoSQL" is a bit misleading.  The only reason is because SQL has been "the predominant way" of doing data storage and therefore anything else is falling into the "NoSQL" camp.  For example, Key/Value store and Graph DB both falls under the NoSQL camp but I don't see any synergy between them.  In fact, I fall into the same trap when I blog about how to do query processing in NoSQL in http://horicky.blogspot.com/2009/11/query-processing-for-nosql-db.html  and notice that I have at most covered the Key/Value store model.

In my opinion, it will be more constructive to categorize different data access patterns as well as different DB model, and then match them.  People certainly can use a RDBMS to model a network graph but then traversing the network results in many-level joins.  You can also use a query to do key-base lookup where a much simpler Key/Value store suffices and perform much faster.

I personally think 90% of enterprise use cases can be fulfilled by SQL DB (a more accurate term is RDBMS).  There are some edge case where another DB model will work better.  As a community, we should articulate what these use cases are.

Another confusion I have is that are we really talking about (SQL vs NoSQL), or are we actually talking about (ACID vs BASE) ?

Rgds,
Ricky

Jeremy Day

unread,
Dec 8, 2009, 10:40:17 AM12/8/09
to nosql-di...@googlegroups.com
Ricky,

There has been a pretty good amount of discussion around the name nosql.  For instance, is it "NoSQL," meaning generally the lack of SQL is the defining characteristic?  Is it "NOSQL," which means "not only SQL?"  There was also a lengthy thread a while ago (sorry, I don't have a link handy) that was attempting to categorize and classify the features present in a number of nosql databases (both the key/value and graph varieties).

Jeremy

eprpa...@gmail.com

unread,
Dec 8, 2009, 10:44:27 AM12/8/09
to nosql-di...@googlegroups.com
The "not only sql" concept was never really accepted by the group. It
was one persons attempt to get around their problem with the meaning of
NoSQL.

Chance
> <mailto:nosql-di...@googlegroups.com>.
> To unsubscribe from this group, send email to
> nosql-discussi...@googlegroups.com
> <mailto:nosql-discussion%2Bunsu...@googlegroups.com>.
> For more options, visit this group at
> http://groups.google.com/group/nosql-discussion?hl=en.
>
>
>
> --
>
> You received this message because you are subscribed to the Google
> Groups "NOSQL" group.
> To post to this group, send email to
> nosql-di...@googlegroups.com
> <mailto:nosql-di...@googlegroups.com>.
> To unsubscribe from this group, send email to
> nosql-discussi...@googlegroups.com
> <mailto:nosql-discussion%2Bunsu...@googlegroups.com>.

eprpa...@gmail.com

unread,
Dec 8, 2009, 10:52:59 AM12/8/09
to nosql-di...@googlegroups.com
Ricky,

You figured out the issue. Sometimes we use the word SQL when we talk
about ACID semantics and sometimes for the language. We need to keep
them separate.

Personally, if you put the language to the side for a minute, every
enterprise application can be written without the need for a database.
After all is that the reason for the object based model of program?

The real strength is language - you can make all sorts of connections
between data columns (even wrong ones!) - and you can quickly find some
datum. But it is only because we've spent 30+ years building the tools
to make it useful. To my way of thinking NoSQL is just at the beginning
of thinking of different ways of structuring data and how to manage and
find it.

Chance
> <mailto:nosql-di...@googlegroups.com>.
> To unsubscribe from this group, send email to
> nosql-discussi...@googlegroups.com
> <mailto:nosql-discussion%2Bunsu...@googlegroups.com>.

Kingsley Idehen

unread,
Dec 8, 2009, 11:25:23 AM12/8/09
to nosql-di...@googlegroups.com
All,

Here is an old blog post of mine titled: The Time for RDBMS Primacy
Downgrade is Nigh! It covers the core of the matter which has a lot to
do with RDBMS technology no longer being the optimal purveyor of
enterprise agility.

Links:

1. http://tr.im/H1B4
2. http://groups.google.com/group/business-of-linked-data-bold --
Business Of Linked Data discussion group (which is about post SQL era we
agility pursuit value pyramid).

--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com




Ricky Ho

unread,
Dec 8, 2009, 3:14:36 PM12/8/09
to nosql-di...@googlegroups.com
Well,  data structure (like HashTable, Queue, Graph, Tree) has been around for a long time.  It is unfair to say that everything is from scratch.  But it is true that we are just beginning to figure out how to persist them and provide some search capability in it.

Rgds,
Ricky

David Portas

unread,
Dec 8, 2009, 9:49:58 AM12/8/09
to NOSQL
On Dec 8, 2:30 pm, George James <Geor...@georgejames.com> wrote:
>
> SQL has indeed caused the database industry to stagnate, to the extent that
> many people today are incapable of conceiving any other way of accessing
> data except SQL.
>

Agreed. However SQL != Relational. It seems odd to me that the
possibility of relational model alternatives to SQL isn't generally
considered part of the NoSQL movement. It is quite possible to
conceive of a Relational DBMS that does not suffer from the familiar
disadvantages of SQL. ie, one that is dynamic, partition tolerant and
with rich object type support.

I wonder if the NoSQL movement is also making the assumption that all
RDBMS must look like SQL? Or is there some fundamental reason why the
relational model is unsatisfactory? If so, I haven't seen that reason
clearly explained anywhere.

David

Paulo Gaspar

unread,
Dec 8, 2009, 9:38:03 PM12/8/09
to nosql-di...@googlegroups.com
Hi,


I am sorry but I often see silliness raising to new highs when talking about the NoSQL theme. As usual, I think that the radicalisms around NoSQL are just BS and the good stuff is the Eventually Consistent / BASE stuff.


Talking about being "liberating" to have no SQL sounds like double BS to me, since I experienced plenty of that: I had plenty experience with "NoSQL" solutions and felt quite liberated when I finally had the chance to use SQL and relational databases.


Ricky Ho's reply is quite on the mark and I just want to add some "historic" perspective.


The problem, you see, is that all this NoSQL crap is just history repeating and BASE, as Ricky stated, is where all the meat is.

All the innovation that matters revolves around the BASE way of working.


"Moving away" from relational and SQL "just because" is silly, or just criminal, bordering the "book burning" episode. And "Those who cannot remember the past are condemned to repeat it." (http://wiki.answers.com/Q/Who_said_Those_who_ignore_history_are_bound_to_repeat_it)

At Amazon they did forego using SQL and relational because, with the current state of technology, they just HAD TO in order to have BASE instead of ACID.


If you think we always had SQL and relational databases around, you must be a newbie (or worse). Only a rookie - and not a very smart one at that - would skip getting the historic facts straight before talking this loud about such an issue.

Or are you trying to sell us "dbm" as a new and revolutionary technology?


If we only go back to 30 years ago, only very few had access to computers.

Some 22 years ago, IBM PC clone use exploded, democratizing computer use but NOT democratizing SQL and relational database use. Many of us used tools like Faircom's C-Tree or Borland's Turbo Database Toolbox which were storage / B-Tree APIs much like the C based Berkley DB was in the beginning (not what is becoming at Oracle).

Which means: many of the older guys know very well what living without relational databases and without SQL feels like.


Do you think Key/Value stores are a new idea? Hey, look at dbm, from 1979 (and its many successors):
http://en.wikipedia.org/wiki/Dbm

I remember reading about Coherence (then Tangosol's, now Oracle's) - which (arguably) is still the most sophisticated key/value store around - in 2002 and it was already version 1.3.


Some times I even read posts/articles talking about hierarchical databases as if it was something new, but the first popular implementation I know of dates from the sixties. Look, its IBM's IMS, which worked for the first time in 1968:
http://en.wikipedia.org/wiki/Information_Management_System

(I confess I didn't know it was THIS old, but I remember IMS was already there when I started.)


We had plenty of NoSQL before relational DBs became popular. Can you imagine how much of your daily life depends on Cobol written code??? Do some of you think COBOL used to have SQL? Think again:
http://en.wikipedia.org/wiki/IDMS


And do you think Object Database technology (from which graph databases are just a variant) are anything new? The first ODBMS systems were created before OOP became really popular, in the eighties:
http://www.odbms.org/Introduction/history.aspx

Graph databases are just a variation of Object Databases (which were also previously already adapted to work with XML nodes...).

...and there are plenty of systems around:
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems

Did you notice the "SQL support" column in that table?
Did you notice how it is filled most of the time?

Do you think it was always this way? They HAD to put SQL there!!!

Object Databases had plenty of advocacy too, and plenty of people pointed ODBMS technology as the best invention since sliced bread and announced that it was next big thing again and again... and it did never happened. It was the kind of discussion that sold magazines (yeah, before Internet).

ODBMS are very useful on plenty of domains. Plenty enough to have some ODBMS products apparently doing quite well. Look at the size (the millions) of Caché's producer:
http://www.intersystems.com/aboutus/index.html

...but also look at how they sell it:
"a high-performance object database that runs SQL five times faster than relational databases."

...text from here:
http://www.intersystems.com/cache/index.html


You have to do commercial development to get why relational is important. The "relational feature" + SQL helps you to fit in relations as an afterthought. You just need to add an index of two, use a new SQL query and there you go.

The thing is that many relations are just an afterthought. Human brains are too limited to imagine, in a single step, all that a complex system will do and will be.

We have to build software incrementally (even when we plan to do it in a single step) and relational makes some of these increments easier. (And it is better to do the right amount of smaller increments before attempting to redesign / refactor.)

SQL is also a "fix it" tool for developers and DBAs. A very important one too.


So, WHAT is new (that matters) in the NoSQL movement?
- BASE

BASE is all that matters in this whole new world. It shouldn't be called the NoSQL movement, it should be called the BASE movement.

And, by the way, that is also not so new either. BASE first reference I know of is from a 1997's paper whose authors also include Eric A. Brewer and Armando Fox, of CAP Theorem's fame. You can find a summary of the whole BASE / CAP history with references to the interesting papers here:
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

... including the mentioned 1997 presentation of the BASE concept at the "Cluster-Based Scalable Network Services" paper from:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2034&rep=rep1&type=pdf

... the presentation of the CAP principle from the "Harvest, yield, and scalable tolerant systems" by
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3690&rep=rep1&type=pdf

... and the more popular presentation of CAP and BASE at the 2000 PODC keynote:
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


It still took a few years (not that many, considering other technologies history) to have the truly industrial proof that the concept worked, thanks to Amazon's Dynamo:
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

And this last URL is the real must read to understand BASE, especially so because it quotes real numbers about what you can achieve.

Every "eventually consistent" database is just trying to copy Dynamo, with a variation here and another there.


Even Ricky Ho's excellent and well illustrated posts are NOT a replacement for the reading of the Dynamo's paper - although they may help on its interpretation.


BASE is the REAL innovation.

All the rest just sounds like history repeating.


And if you talk with some of the distributed computing veterans, they will start pointing how so many of the technologies used by Dynamo are old news too. Go and check the age for Vector Clocks (1988) (and the Lamport timestamps predecessor, 1978), Merkle Trees (1979 patent) or the Gossip protocols (there is a first reference of the idea at Wikipedia which reads "Epidemic algorithms for replicated database management. Alan Demers, et al. Proc. 6th ACM PODC, Vancouver BC, 1987.").

Maybe there is even some "Dynamo" like database I do not know of.

Anyway, Amazon proved it works like no Inktomi project did: It was not only about visibility but also about SCALE.

And its target was not some bigger than life achievement like indexing the Internet but something much more prosaic: e- Commerce!!!


And what about SQL?

I don't think we need less than SQL and relational. Actually, I think we need much more.

We need:
- a query engine that knows how to calculate the cost of distributed joins, network data transfers (under current network conditions, whatever that means at the moment and place) and other distributed database specific operations;
- a database engine that can optimize the database trough re-partitioning and de-normalization of data according to production needs;
- a new SQL language that covers the different topologies a distributed DB can have and "BASE aware".


I already found plenty evidence of research projects dealing with just that kind of innovation.

But this is too complex to achieve up in a short time. Most of the "ACID" SQL engines we know are still leaky abstractions, even without all this new complexity. The best we can do for distributed databases in a short period of time are the NoSQL solutions we know of.

And please notice that even in the NoSQL world, some query solutions are starting to look familiar, like with Hadoop's Pig Latin:
http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html


CONCLUSION:

IMO "NoSQL" is only a liberation for newbies too lazy to learn SQL or developers too limited to get its power.

The awkwardness of using SQL based database interfaces with programming languages - and related ORM crap - is a interface issue that should already have been solved ages ago... but we know how the embedded SQL solutions always failed and how most ORM APIs are crappy. (Maybe there is hope in LINQ like techniques...)

BASE is the real liberation because it allows us to achieve performance / availability targets we can not dream off with the standard ACID technologies.

(Unfortunately there is no query language for BASE databases as good as SQL is for relational.)

All the rest is hype and history repeating while, unfortunately, past teachings are ignored.


Have fun,
Paulo Gaspar

http://paulogaspar7.blogspot.com/
@paulogaspar7

eprpa...@gmail.com

unread,
Dec 9, 2009, 2:24:47 AM12/9/09
to nosql-di...@googlegroups.com, Paulo Gaspar
Actually I think there are just objects. In the RDBMS model they are
stored in rows and columns. In the NoSQL case they are stored in k/v
sets. In other implements there may be other ways to represent data.

But the reality is everything can be represented as an object with some
relation (implied or specified) to other objects. After all that is just
the model we use for programming.

ACID and BASE just show how consistency is to be maintained by various
parts of the objects and relations.

As for "SQL" it is just a restricted language applied to one model of
relational data.

Chance
> ODBMS are very useful on plenty of domains. Plenty enough to have some ODBMS products apparently doing quite well. Look at the size (the millions) of Cach�'s producer:

Rob Tweed

unread,
Dec 9, 2009, 3:51:05 AM12/9/09
to nosql-di...@googlegroups.com
Paulo

Despite your (IMHO) somewhat hectoring approach, I nevertheless found your posting useful and thought-provoking, particularly for the references and for pointing out to those of us who have been around some time that this is one of those frequent situations in IT where history comes back around: same concept, different name this time.

However, I'd particularly like to disagree with your rather patronising conclusion:


> IMO "NoSQL" is only a liberation for newbies too lazy to learn SQL or developers too limited to get its power.

Actually it's a liberation for many of us who go back before the rise of relational and who watched the many promises and much vaunted benefits of the RDBMS fail to meet the heavily marketed expectations.  As Bhaskar has pointed out, yes the RDBMS proved to be a good match for many straightforward business situations, but for other scenarios it plainly (to at least some of us) didn't, yet we had to watch while the RDBMS was force-fitted everywhere and we were told we didn't know what we were talking about.

Don't get me wrong - SQL has its place and is a useful language to graft onto many databases even if they aren't relational and I use it often.  Witness Amazon's SimpleDB where they latterly added on an SQL-like "select ...from...where" API for searching and querying.  Also witness InterSystems who layered SQL onto their Cache' Object database .....which under those shiny Object Oriented covers uses a Key/Value database engine by the way (http://gradvs1.mgateway.com/download/extreme1.pdf)

So yes, there's very little ultimately new in IT, but please don't assume that NoSQL is simply the interest area of a bunch of naive newbies or old hacks who never saw the light or couldn't cut it in the relational world.  For some of us it's a renaissance of an area of database technology that has proven to be appropriate for many of the new Internet-scale requirements and vindication that we were right all along to remain sceptical about the "all you ever need is an RDBMS" marketing BS.

Oh and for those who are new to the acronym BASE, it's Basically Available, Soft State, Eventually Consistent.  Here's another useful reference to those provided by Paulo: http://highscalability.com/drop-acid-and-think-about-data




2009/12/9 Paulo Gaspar <paulo....@gmail.com>



--
Rob Tweed
Director
M/Gateway Developments Ltd

The Pursuit of Productivity : http://www.mgateway.com

Paulo Gaspar

unread,
Dec 9, 2009, 5:23:09 AM12/9/09
to nosql-di...@googlegroups.com
Exactly. those are the basic bricks.

We need as much to "avoid SQL" as to "avoid k/v stores" or ODBMS, all of which are around for a long time already.

You should notice, however, that there is a trend (probably because there is a need) to have a Query Language for the less simple storage models, be it SQL, OQL, XQL, Pig Latin or any of the JSON simplistic one used on some new NoSQL databases,


Regards,
Paulo Gaspar
>> ODBMS are very useful on plenty of domains. Plenty enough to have some ODBMS products apparently doing quite well. Look at the size (the millions) of Caché's producer:

Paulo Gaspar

unread,
Dec 9, 2009, 6:20:02 AM12/9/09
to nosql-di...@googlegroups.com
Hi Rob,


Sorry but I was not accurate enough.

If I considered the whole NoSQL movement as lacking interest, I would not be on this list.

What I wanted to refer to in the sentence you quote was actually the "no SQL" attitude - refusing to use SQL at all - I often find going around (and I don't only mean online) and not the "NoSQL" as a movement.

There ALWAYS were well known alternatives to SQL / Relational. Relational databases were never the standard storage, just the most frequently used OPTION.

Even when looking at ORM products/standards one often finds those APIs were totally (or part of them) designed to be able to work on top (provided the proper implementation) of non relational, non SQL based stores. This is an example that people working on storage systems consider non relational, non SQL based alternatives.


So, my point is:

- The only meaningful novelty I find being popularized by the NoSQL movement is BASE;

- All other storage architectures and storage access OPTIONS (also including relational and SQL together with the NoSQL stuff) are around for ages and we should not deny them but, instead, build on top of past experiences.


...and these past experiences encompass, obviously, the knowledge that relational is not good for everything and SQL is not always needed / desirable.


The High Scalability site is always a good starting point but it is frequently just introductory / provoking. One should always follow the reference links usually provided at the end of the posts for depth. Unfortunately the link to (at least) one of the fundamental papers at the post you point to is broken - the Dynamo paper. The original can be found here:


Regards,
Paulo Gaspar

rtweed

unread,
Dec 9, 2009, 6:27:19 AM12/9/09
to NOSQL
Paulo

I think you've hit the nail firmly on the head - total agreement with
you

Rob

On 9 Dec, 11:20, Paulo Gaspar <paulo.gas...@gmail.com> wrote:
> Hi Rob,
>
> Sorry but I was not accurate enough.
>
> If I considered the whole NoSQL movement as lacking interest, I would not be on this list.
>
> What I wanted to refer to in the sentence you quote was actually the "no SQL" attitude - refusing to use SQL at all - I often find going around (and I don't only mean online) and not the "NoSQL" as a movement.
>
> There ALWAYS were well known alternatives to SQL / Relational. Relational databases were never the standard storage, just the most frequently used OPTION.
>
> Even when looking at ORM products/standards one often finds those APIs were totally (or part of them) designed to be able to work on top (provided the proper implementation) of non relational, non SQL based stores. This is an example that people working on storage systems consider non relational, non SQL based alternatives.
>
> So, my point is:
>
> - The only meaningful novelty I find being popularized by the NoSQL movement is BASE;
>
> - All other storage architectures and storage access OPTIONS (also including relational and SQL together with the NoSQL stuff) are around for ages and we should not deny them but, instead, build on top of past experiences.
>
> ...and these past experiences encompass, obviously, the knowledge that relational is not good for everything and SQL is not always needed / desirable.
>
> The High Scalability site is always a good starting point but it is frequently just introductory / provoking. One should always follow the reference links usually provided at the end of the posts for depth. Unfortunately the link to (at least) one of the fundamental papers at the post you point to is broken - the Dynamo paper. The original can be found here:
>  http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
>
> Regards,
> Paulo Gaspar
>
> On 2009-12-09, at 08:51, Rob Tweed wrote:
>
> > Paulo
>
> > Despite your (IMHO) somewhat hectoring approach, I nevertheless found your posting useful and thought-provoking, particularly for the references and for pointing out to those of us who have been around some time that this is one of those frequent situations in IT where history comes back around: same concept, different name this time.
>
> > However, I'd particularly like to disagree with your rather patronising conclusion:
>
> > > IMO "NoSQL" is only a liberation for newbies too lazy to learn SQL or developers too limited to get its power.
>
> > Actually it's a liberation for many of us who go back before the rise of relational and who watched the many promises and much vaunted benefits of the RDBMS fail to meet the heavily marketed expectations.  As Bhaskar has pointed out, yes the RDBMS proved to be a good match for many straightforward business situations, but for other scenarios it plainly (to at least some of us) didn't, yet we had to watch while the RDBMS was force-fitted everywhere and we were told we didn't know what we were talking about.
>
> > Don't get me wrong - SQL has its place and is a useful language to graft onto many databases even if they aren't relational and I use it often.  Witness Amazon's SimpleDB where they latterly added on an SQL-like "select ...from...where" API for searching and querying.  Also witness InterSystems who layered SQL onto their Cache' Object database .....which under those shiny Object Oriented covers uses a Key/Value database engine by the way (http://gradvs1.mgateway.com/download/extreme1.pdf)
>
> > So yes, there's very little ultimately new in IT, but please don't assume that NoSQL is simply the interest area of a bunch of naive newbies or old hacks who never saw the light or couldn't cut it in the relational world.  For some of us it's a renaissance of an area of database technology that has proven to be appropriate for many of the new Internet-scale requirements and vindication that we were right all along to remain sceptical about the "all you ever need is an RDBMS" marketing BS.
>
> > Oh and for those who are new to the acronym BASE, it's Basically Available, Soft State, Eventually Consistent.  Here's another useful reference to those provided by Paulo:http://highscalability.com/drop-acid-and-think-about-data
>
> > 2009/12/9 Paulo Gaspar <paulo.gas...@gmail.com>
> > Hi,
>
> > I am sorry but I often see silliness raising to new highs when talking about the NoSQL theme. As usual, I think that the radicalisms around NoSQL are just BS and the good stuff is the Eventually Consistent / BASE stuff.
>
> > Talking about being "liberating" to have no SQL sounds like double BS to me, since I experienced plenty of that: I had plenty experience with "NoSQL" solutions and felt quite liberated when I finally had the chance to use SQL and relational databases.
>
> > Ricky Ho's reply is quite on the mark and I just want to add some "historic" perspective.
>
> > The problem, you see, is that all this NoSQL crap is just history repeating and BASE, as Ricky stated, is where all the meat is.
>
> > All the innovation that matters revolves around the BASE way of working.
>
> > "Moving away" from relational and SQL "just because" is silly, or just criminal, bordering the "book burning" episode. And "Those who cannot remember the past are condemned to repeat it." (http://wiki.answers.com/Q/Who_said_Those_who_ignore_history_are_bound...)
>
> > At Amazon they did forego using SQL and relational because, with the current state of technology, they just HAD TO in order to have BASE instead of ACID.
>
> > If you think we always had SQL and relational databases around, you must be a newbie (or worse). Only a rookie - and not a very smart one at that - would skip getting the historic facts straight before talking this loud about such an issue.
>
> > Or are you trying to sell us "dbm" as a new and revolutionary technology?
>
> > If we only go back to 30 years ago,  only very few had access to computers.
>
> > Some 22 years ago, IBM PC clone use exploded, democratizing computer use but NOT democratizing SQL and relational database use. Many of us used tools like Faircom's C-Tree or Borland's Turbo Database Toolbox which were storage / B-Tree APIs much like the C based Berkley DB was in the beginning (not what is becoming at Oracle).
>
> > Which means: many of the older guys know very well what living without relational databases and without SQL feels like.
>
> > Do you think Key/Value stores are a new idea? Hey, look at dbm, from 1979 (and its many successors):
> >  http://en.wikipedia.org/wiki/Dbm
>
> > I remember reading about Coherence (then Tangosol's, now Oracle's) - which (arguably) is still the most sophisticated key/value store around - in 2002 and it was already version 1.3.
>
> > Some times I even read posts/articles talking about hierarchical databases as if it was something new, but the first popular implementation I know of dates from the sixties. Look, its IBM's IMS, which worked for the first time in 1968:
> >  http://en.wikipedia.org/wiki/Information_Management_System
>
> > (I confess I didn't know it was THIS old, but I remember IMS was already there when I started.)
>
> > We had plenty of NoSQL before relational DBs became popular. Can you imagine how much of your daily life depends on Cobol written code??? Do some of you think COBOL used to have SQL? Think again:
> >  http://en.wikipedia.org/wiki/IDMS
>
> > And do you think Object Database technology (from which graph databases are just a variant) are anything new? The first ODBMS systems were created before OOP became really popular, in the eighties:
> >  http://www.odbms.org/Introduction/history.aspx
>
> > Graph databases are just a variation of Object Databases (which were also previously already adapted to work with XML nodes...).
>
> > ...and there are plenty of systems around:
> >  http://en.wikipedia.org/wiki/Comparison_of_object_database_management...
>
> > Did you notice the "SQL support" column in that table?
> > Did you notice how it is filled most of the time?
>
> > Do you think it was always this way? They HAD to put SQL there!!!
>
> > Object Databases had plenty of advocacy too, and plenty of people pointed ODBMS technology as the best invention since sliced bread and announced that it was next big thing again and again... and it did never happened. It was the kind of discussion that sold magazines (yeah, before Internet).
>
> > ODBMS are very useful on plenty of domains. Plenty enough to have some ODBMS products apparently doing quite well. Look at the size (the millions) of Caché's producer:
> >  http://www.intersystems.com/aboutus/index.html
>
> > ...but also look at how they sell it:
> >  "a high-performance object database that runs SQL five times faster than relational databases."
>
> > ...text from here:
> >  http://www.intersystems.com/cache/index.html
>
> > You have to do commercial development to get why relational is important. The "relational feature" + SQL helps you to fit in relations as an afterthought. You just need to add an index of two, use a new SQL query and there you go.
>
> > The thing is that many relations are just an afterthought. Human brains are too limited to imagine, in a single step, all that a complex system will do and will be.
>
> > We have to build software incrementally (even when we plan to do it in a single step) and relational makes some of these increments easier. (And it is better to do the right amount of smaller increments before attempting to redesign / refactor.)
>
> > SQL is also a "fix it" tool for developers and DBAs. A very important one too.
>
> > So, WHAT is new (that matters) in the NoSQL movement?
> >  - BASE
>
> > BASE is all that matters in this whole new world. It shouldn't be called the NoSQL movement, it should be called the BASE movement.
>
> > And, by the way, that is also not so new either. BASE first reference I know of is from a 1997's paper whose authors also include Eric A. Brewer and Armando Fox, of CAP Theorem's
>
> ...
>
> read more »

Kingsley Idehen

unread,
Dec 9, 2009, 7:05:35 AM12/9/09
to nosql-di...@googlegroups.com
In a nutshell:

1. Multi-Model BASE (a Virtual Database) rather than single model which
always takes down the "one size fits all" cul-de-sac
2. Domain Specific Languages aligned the relevant Data Model
3. Incorporate Network awareness into the records hosted by the BASE.

1-3 when applied to HTTP based Networks is basically the real story
behind the Linked Data meme :-)

SQL isn't going anywhere soon, but its days of primacy re. Ad-Hoc access
to data are well an truly over due to:

1. Data Model Heterogeneity
2. Data Access Protocol Heterogeneity
3. Move to Schema last as opposed to Schema First
4. Fluidity of Context (units of measurement across locales, for instance)
5. Need for Data Object Identifiers rather than Value based Identifiers
6. Network Scale Data Object Identifiers (e.g. HTTP URIs)
7. Increasing need to break down Data Silos (Enterprise and Web) that
enable meshing of disparately structured and hosted data sources.

I think the real message re. "No SQL" should be about the pursuit of
modern data access agility i.e., seeking a new ways to attain or sustain
scalable Ad-hoc interaction with increasingly disparate data sources.

--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com



>
>
>> <mailto:paulo....@gmail.com>>
>> (the millions) of Cach�'s producer:
>> <http://www.cs.berkeley.edu/%7Ebrewer/cs262b-2004/PODC-keynote.pdf>
>> >> <mailto:nosql-di...@googlegroups.com
>> <mailto:nosql-di...@googlegroups.com>>.
>> >> To unsubscribe from this group, send email to
>> >> nosql-discussi...@googlegroups.com
>> <mailto:nosql-discussion%2Bunsu...@googlegroups.com>
>> >> <mailto:nosql-discussion%2Bunsu...@googlegroups.com
>> <mailto:nosql-discussion%252Buns...@googlegroups.com>>.
>> Rob Tweed
>> Director
>> M/Gateway Developments Ltd
>>
>> The Pursuit of Productivity : http://www.mgateway.com
>> <http://www.mgateway.com/>
>>
>> --
>>
>> You received this message because you are subscribed to the Google
>> Groups "NOSQL" group.
>> To post to this group, send email to
>> nosql-di...@googlegroups.com
>> <mailto:nosql-di...@googlegroups.com>.
>> To unsubscribe from this group, send email to
>> nosql-discussi...@googlegroups.com
>> <mailto:nosql-discussi...@googlegroups.com>.

Ricky Ho

unread,
Dec 9, 2009, 10:34:32 AM12/9/09
to nosql-di...@googlegroups.com
I think one of the major different is not between Relational vs Non-relational, nor between SQL vs Non-SQL.

The major difference, in my opinion, is 20 - 30 years ago when people start to implement the RDBMS, "distributed architecture" has never been considered in their design.  In fact, company are more focus in selling the expensive hardware rather than software.  In other words, all the implementation optimization techniques that has developed in our RDBMS technologies is all based on a single-large-machine architecture.  And this model has been well-accepted in last 20 - 30 years.

But in the last 5 - 10 years, due to the trend of using commoditized hardware to save cost, the rise of open source ...  People start realizing the importance of "distributed architecture" to support such a trend.  On the other hand, most of the web online business prefer availability over data consistency.  So some of the design consideration of traditional RDBMS model is being challenged.  To be fair, I personal think this are implementation decisions which is not fundamental to the Relational model itself.  In my opinion, it is a perfect choice to re-implement the RDBMS model in a highly distributed environment.

I guess why we haven't seen this happening a lot is because people expert in Relational model and people expert in distributed computing are pretty much two disjoint set of people.

Rgds,
Ricky

eprpa...@gmail.com

unread,
Dec 9, 2009, 10:43:56 AM12/9/09
to nosql-di...@googlegroups.com, Ricky Ho
The problem is in ACID vs. BASE. Even in a distributed system you still
run in to 2PC or 3PC and that just takes a lot of performance time. So
it still comes down to the right way to divide up the data to minimize
the consistency needed. I believe that is the real meaning behind NoSQL
- not all systems require complete 2PC of all data.

Chance
> <mailto:Geor...@georgejames.com>> wrote:
> >
> > SQL has indeed caused the database industry to stagnate, to the
> extent that
> > many people today are incapable of conceiving any other way of
> accessing
> > data except SQL.
> >
>
> Agreed. However SQL != Relational. It seems odd to me that the
> possibility of relational model alternatives to SQL isn't generally
> considered part of the NoSQL movement. It is quite possible to
> conceive of a Relational DBMS that does not suffer from the familiar
> disadvantages of SQL. ie, one that is dynamic, partition tolerant and
> with rich object type support.
>
> I wonder if the NoSQL movement is also making the assumption that all
> RDBMS must look like SQL? Or is there some fundamental reason why the
> relational model is unsatisfactory? If so, I haven't seen that reason
> clearly explained anywhere.
>
> David
>
> --
>
> You received this message because you are subscribed to the Google
> Groups "NOSQL" group.
> To post to this group, send email to
> nosql-di...@googlegroups.com
> <mailto:nosql-di...@googlegroups.com>.
> To unsubscribe from this group, send email to
> nosql-discussi...@googlegroups.com
> <mailto:nosql-discussion%2Bunsu...@googlegroups.com>.

Asaf Ary

unread,
Dec 9, 2009, 11:01:09 AM12/9/09
to NOSQL
Good point Ricky,

Allow me to extend it.
20-30 years ago (or even 10) the notion of distributed data stores
already existed. However, the bottlenecks were different
as can be clearly seen in most / all the old data stores which came
out of the academy.

The focus in these distributed databases was always on the computing
power (CPU) of the machine, and using distribution as a way to perform
more complex operations.
The current need of web applications is the converse of that, CPU is
usually not the bottle neck when handling web scale traffic.
but rather the storing of massive amount of data (PB) or network
traffic.

If you get 1million requests per second it is highly unlikely that a
single machine will be able to handle the network traffic (even at the
network layer), you will have to go into replication or sharding at
some point.
Using algorithms designed to reduce computations at the expense of
network traffic, which was the norm, are irrelevant.

Besides, every week or so this discussion comes up. I'll say it again:

There is more to non-relational than scale!!

There are several use cases where the relational model doesn't fit the
needs of the programmer, and while it would be possible to use them it
would make no sense.
For instance a graph based data model (like Neo4j) is designed to
provide efficient implementation of operations such as shortest path.
This can be replicated in the relational model using a graph relation
with vertex objects and edge objects and using recursive join
operations on the vertices of the graph,
but this would be the most cumbersome ineffective way to handle it. In
these cases the relational model is simply impractical and no level of
optimization will do.

Even if you aren't a proponent of different Data models, there are
still issues with handling your data which are problematic at best...
1) evolving schema - data first approach
2) multi-level aggregation - the need for recursive "GROUP BY"
operations
and many more...
imply
I these cases the relational model needs an alternative the is more
adapt to the needs of today's consumers (developers), the fact that
all major online players use non-relational data stores (mostly in
conjuction with other SQL based data stores) should be evidence enough
that there is a real need for alternatives to the relational model and
not simply a "trend" issue.

This discussion keeps repeating, and I think one of the issues that
people are having is knowing when a non-relational database provides a
better alternative (even at the long term) than a relational database.
I'll try to post a more concise list of use cases in the near future
and hopefully that will help shed some light on the issue.


Thanks
Asaf

On Dec 9, 5:34 pm, Ricky Ho <rickyphyl...@gmail.com> wrote:
> I think one of the major different is not between Relational vs
> Non-relational, nor between SQL vs Non-SQL.
>
> The major difference, in my opinion, is 20 - 30 years ago when people start
> to implement the RDBMS, "distributed architecture" has never been considered
> in their design.  In fact, company are more focus in selling the expensive
> hardware rather than software.  In other words, all the implementation
> optimization techniques that has developed in our RDBMS technologies is all
> based on a single-large-machine architecture.  And this model has been
> well-accepted in last 20 - 30 years.
>
> But in the last 5 - 10 years, due to the trend of using commoditized
> hardware to save cost, the rise of open source ...  People start realizing
> the importance of "distributed architecture" to support such a trend.  On
> the other hand, most of the web online business prefer availability over
> data consistency.  So some of the design consideration of traditional RDBMS
> model is being challenged.  To be fair, I personal think this are
> implementation decisions which is not fundamental to the Relational model
> itself.  In my opinion, it is a perfect choice to re-implement the RDBMS
> model in a highly distributed environment.
>
> I guess why we haven't seen this happening a lot is because people expert in
> Relational model and people expert in distributed computing are pretty much
> two disjoint set of people.
>
> Rgds,
> Ricky
>
> On Tue, Dec 8, 2009 at 6:49 AM, David Portas <dpor...@gmail.com> wrote:
> > On Dec 8, 2:30 pm, George James <Geor...@georgejames.com> wrote:
>
> > > SQL has indeed caused the database industry to stagnate, to the extent
> > that
> > > many people today are incapable of conceiving any other way of accessing
> > > data except SQL.
>
> > Agreed. However SQL != Relational. It seems odd to me that the
> > possibility of relational model alternatives to SQL isn't generally
> > considered part of the NoSQL movement. It is quite possible to
> > conceive of a Relational DBMS that does not suffer from the familiar
> > disadvantages of SQL. ie, one that is dynamic, partition tolerant and
> > with rich object type support.
>
> > I wonder if the NoSQL movement is also making the assumption that all
> > RDBMS must look like SQL? Or is there some fundamental reason why the
> > relational model is unsatisfactory? If so, I haven't seen that reason
> > clearly explained anywhere.
>
> > David
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "NOSQL" group.
> > To post to this group, send email to nosql-di...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > nosql-discussi...@googlegroups.com<nosql-discussion%2Bunsu...@googlegroups.com>
> > .

George James

unread,
Dec 9, 2009, 12:52:23 PM12/9/09
to nosql-di...@googlegroups.com
Asaf Ary wrote:
> For instance a graph based data model (like Neo4j) is designed to provide
> efficient implementation of operations such as shortest path. This can be
> replicated in the relational model using a graph relation with vertex
objects
> and edge objects and using recursive join operations on the vertices of
the
> graph, but this would be the most cumbersome ineffective way to handle it.

> In these cases the relational model is simply impractical and no level of
> optimization will do.

To what extent is this a limitation of the relational model? I believe it's
actually a limitation of SQL, the query language that is invariably used
with relational databases, rather than with the relational model itself.

In many cases the underlying storage and representation is perfectly
adequate, it's the tools and method of expressing the task that is where the
problem lies. In SQL it's manifest as a problem with cumbersome and
recursive joins.

Regards
George

Kingsley Idehen

unread,
Dec 9, 2009, 1:22:27 PM12/9/09
to nosql-di...@googlegroups.com
George James wrote:
> Asaf Ary wrote:
>
>> For instance a graph based data model (like Neo4j) is designed to provide
>> efficient implementation of operations such as shortest path. This can be
>> replicated in the relational model using a graph relation with vertex
>>
> objects
>
>> and edge objects and using recursive join operations on the vertices of
>>
> the
>
>> graph, but this would be the most cumbersome ineffective way to handle it.
>>
>
>
>> In these cases the relational model is simply impractical and no level of
>> optimization will do.
>>
>
> To what extent is this a limitation of the relational model? I believe it's
> actually a limitation of SQL, the query language that is invariably used
> with relational databases, rather than with the relational model itself.
>
>
Identity model in RDBMS engines is value based. In Object DBMS systems
its Identifier based.

JOIN overhead is huge for RDBMS engines when dealing with complex
relationships (rare in the past, query 101 in today heterogeneous world
with Web, GPS, and mobility as driving factors).
> In many cases the underlying storage and representation is perfectly
> adequate, it's the tools and method of expressing the task that is where the
> problem lies. In SQL it's manifest as a problem with cumbersome and
> recursive joins.
>
Yes, but the RDBMS engine implementations of Relational Model are value
based.

Ad-hoc query language availability snafu is what killed OODBMS uptake.
ODMG delivered OQL way too late. Also, there wasn't a ubiquitous
federated information space such as today's Web so the data
heterogeneity pain wasn't palpable.

Thus, the whole momentum you see around RDF Quad Stores and Linked Data
meme etc. .is really about many old OODBMS and Distributed Data Objects
re-hashed in the new context created by the Web and Internet.

EAV-CR is the underlying Graph Model that delivers the dexterity
required for today's challenges. NoSQL, RDF, Key Stores etc.. are just
monikers. The game is about the BASE with records endowed with Network
Oriented Identifiers (e.g. Generic HTTP URIs as per Linked Data meme) +
REST style of data access.

As stated earlier, we need to look towards multi-model DBMS engines that
are capable of heterogeneous data virtualization.

Links:

1.
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/clamen/OODBMS/Manifesto/htManifesto/node2.html
2. http://www.odbms.org
3. http://www.openlinksw.com/weblog/oerling - -post about the technical
aspects of OpenLink Virtuoso (a multi-model DBMS engine with in-built
heterogeneous data virtualization across Relational (SQL), Graph
(SPARQL), Hierarchical (XQuery), and Full Text)
4. http://dbpedia.org -- live instance of Virtuoso
5. http://dbpedia-live.openlinksw.com/stats -- live instance of Virtuoso
with live updates from Wikipedia.

Jason Dusek

unread,
Dec 9, 2009, 5:58:27 PM12/9/09
to nosql-discussion
2009/12/09 Kingsley Idehen <kid...@openlinksw.com>:
> George James wrote:
> > To what extent is this a limitation of the relational model?
> > I believe it's actually a limitation of SQL, the query
> > language that is invariably used with relational databases,
> > rather than with the relational model itself.
>
> Identity model in RDBMS engines is value based. In Object DBMS
> systems its Identifier based.

Why does that actually matter? SQLite, for example, maintains
a notion of "row identity" that is independent of any values
in a row, even when the row has no primary key.

--
Jason Dusek

Alex Woodhead

unread,
Dec 9, 2009, 7:18:24 PM12/9/09
to nosql-di...@googlegroups.com
"> Identity model in RDBMS engines is value based. In Object DBMS
> systems its Identifier based."

Similar to SQLite notion I believe Oracle also uses ROWID to uniquely identify a row within the context of a single table, irrespective of the values contained within a row. It also uses the pseudocolum OBJECT_ID for  object primary keys.

Intersystems Cache has %ROWID from the SQL view which interoperates also as an object identifier.

Are these then both RDBMS and Object DBMS? What's going on?

Kingsley Idehen

unread,
Dec 10, 2009, 12:25:45 AM12/10/09
to nosql-di...@googlegroups.com
It matters because it provides basis for explicit relations that are
highly navigable. What I describe is much more powerful when the you
have identifiers for:

1. Row
2. Fields
3. Field Values (optionally)
4. Table

1-3 give you the 3-tuple or triple. 1-4 the 4-tuple or quad.

Make or Map those identifiers to Generic HTTP URIs, and you have
something really special re. federated Entity-Attribute-Value Graph
Model. Basically, you can then point to records across engine, operating
system, and network boundaries. In a nutshell, that's what the whole
Linked Data meme if fundamentally about.

Kingsley Idehen

unread,
Dec 10, 2009, 12:30:15 AM12/10/09
to nosql-di...@googlegroups.com
> --
>
> You received this message because you are subscribed to the Google
> Groups "NOSQL" group.
> To post to this group, send email to nosql-di...@googlegroups.com.
> To unsubscribe from this group, send email to
> nosql-discussi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nosql-discussion?hl=en.
Yes re. the hybrid nature. Basically you Relational Model and EAV [1]
model capabilities in one place. Note systems that have these
Identifiers typical espouse their use for optimized data access for
certain types of queries that would be taxing re. value oriented joins.

Links:

1. http://en.wikipedia.org/wiki/Entity-attribute-value_model

Peter Neubauer

unread,
Dec 10, 2009, 4:24:52 AM12/10/09
to nosql-discussion
I totally concur, Kingsley.

It highlights the fact that the really interesting point is the
underlying data model, querying capabilities and ease of use that
determine what you can do (in reasonable amounts of time and with
reasonable skillz) with your data. A lot of the NOSQL approaches is
focused on the sheer data size of things (IMHO rather a bug-fix of
just one of the RDBMS problems), and for that matter simplifying the
data model to e.g. Key-Value models which I do not think is a long
term valid approach. RDBMS are going to solve the sheer scaling and
data size problem in one way or another, e.g. using RAM-clouds or
other technologies.

As it stands now, there are two common ways to properly normalize data
into a data model without duplication and with the possibility for
data resue: the relational model as in the RDBMS, and the Graph
models, as in RDF, Semantic Web, LinkedData and Graph Databases like
Neo4j, Pygr, Sones, and TripleStores.

A lot of time will not only be spent in the pure storing and
structuring of your data but even in making sense and extracting value
from it. Traditionally that has been done via reporting and simple
statistic clustering etc, which RDBMSes are very good at and SQL is
designed for. Nowadays we are seeing value being created via
recommender systems traversing the social network, correlation
analysis, co-occurence graphs and other deeply context-aware
analytics. The lack of a capable query language for graph analytics is
a problem - SPARQL etc does not fit the bill as it is purely focused
on finding patterns. We are working on better ways to do this, like
Gremlin (http://gremlin.tinkerpop.com), inspried by a crossover of
XPath2 and graph-syntax to cope with this.

So, even if RDBMS might get better in terms of scalability, still
there are big differences in modeling data, querying, evolving schemas
etc that will not go away and are another driving factor for the
emergent crop of NOSQL databases.

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.

Kingsley Idehen

unread,
Dec 10, 2009, 7:36:20 AM12/10/09
to nosql-di...@googlegroups.com
Peter Neubauer wrote:
> I totally concur, Kingsley.
>
> It highlights the fact that the really interesting point is the
> underlying data model, querying capabilities and ease of use that
> determine what you can do (in reasonable amounts of time and with
> reasonable skillz) with your data. A lot of the NOSQL approaches is
> focused on the sheer data size of things (IMHO rather a bug-fix of
> just one of the RDBMS problems), and for that matter simplifying the
> data model to e.g. Key-Value models which I do not think is a long
> term valid approach. RDBMS are going to solve the sheer scaling and
> data size problem in one way or another, e.g. using RAM-clouds or
> other technologies.
>
Yes, but I think we end up with more DBMS (BASE) and less traditional
RDBMS i.e. more multi-model hybrids that used to be one model or the
other, solely.


> As it stands now, there are two common ways to properly normalize data
> into a data model without duplication and with the possibility for
> data resue: the relational model as in the RDBMS, and the Graph
> models, as in RDF, Semantic Web, LinkedData and Graph Databases like
> Neo4j, Pygr, Sones, and TripleStores.
>
> A lot of time will not only be spent in the pure storing and
> structuring of your data but even in making sense and extracting value
> from it. Traditionally that has been done via reporting and simple
> statistic clustering etc, which RDBMSes are very good at and SQL is
> designed for. Nowadays we are seeing value being created via
> recommender systems traversing the social network, correlation
> analysis, co-occurence graphs and other deeply context-aware
> analytics.

Yep!
> The lack of a capable query language for graph analytics is
> a problem - SPARQL etc does not fit the bill as it is purely focused
> on finding patterns. We are working on better ways to do this, like
> Gremlin (http://gremlin.tinkerpop.com), inspried by a crossover of
> XPath2 and graph-syntax to cope with this.
>
Also look at what we are doing, I am assuming you are aware of the live
instances we have that showcase many enhancements we've made to basic
SPARQL?
> So, even if RDBMS might get better in terms of scalability, still
> there are big differences in modeling data, querying, evolving schemas
> etc that will not go away and are another driving factor for the
> emergent crop of NOSQL databases.
>
Hybrids are the key :-)

Links:

1. http://lod.openlinksw.com/fct -- this is hosts 8 Billion 3-tuples
based on the Linked Open Cloud Data Sets
2. http://lod.openlinksw.com/sparql - this is the SPARQL endpoint (note
we let you even hook SQL into SPARQL queries so that the Relational side
of Virtuoso which also has Spatial or R-Indexes for GeopSpatial SQL or
SPARQL etc..)


> Cheers,
>
> /peter neubauer
>
> COO and Sales, Neo Technology
>
> GTalk: neubauer.peter
> Skype peter.neubauer
> Phone +46 704 106975
> LinkedIn http://www.linkedin.com/in/neubauer
> Twitter http://twitter.com/peterneubauer
>
> http://www.neo4j.org - Relationships count.
> http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
> http://www.linkedprocess.org - Computing at LinkedData scale.
>
>


Peter Neubauer

unread,
Dec 10, 2009, 5:28:55 PM12/10/09
to nosql-discussion
Kingsley
On Thu, Dec 10, 2009 at 1:36 PM, Kingsley Idehen <kid...@openlinksw.com> wrote:
> Yes, but I think we end up with more DBMS (BASE) and less traditional
> RDBMS i.e. more multi-model hybrids that used to be one model or the
> other, solely.
True that.

> Also look at what we are doing, I am assuming you are aware of the live
> instances we have that showcase many enhancements we've made to basic
> SPARQL?
Any info on that?

> 1. http://lod.openlinksw.com/fct -- this is hosts 8 Billion 3-tuples
> based on the Linked Open Cloud Data Sets
> 2. http://lod.openlinksw.com/sparql - this is the SPARQL endpoint (note
> we let you even hook SQL into SPARQL queries so that the Relational side
> of Virtuoso which also has Spatial or R-Indexes for GeopSpatial SQL or
> SPARQL etc..)
Sites seem to be down for me ...

/peter

Kingsley Idehen

unread,
Dec 10, 2009, 5:37:40 PM12/10/09
to nosql-di...@googlegroups.com
Sites, been a little volatile due to maintenance this week.

here are a variety of live instances:

1. http://lod.openlinksw.com/sparql -- LOD Cloud Cache
2. http://dbpedia.org/sparql -- DBpedia
3. http://bbc.openlinksw.com/sparql -- BBC Programmes and Music
4. http://uriburner.com/sparql -- URIBurner Service

In all cases above, substitute "sparql" with "fct" and you get the
Faceted Search and Navigation UI (a front to Faceted Navigation Engine).

Kingsley
> /peter
>
> --
>
> You received this message because you are subscribed to the Google Groups "NOSQL" group.
> To post to this group, send email to nosql-di...@googlegroups.com.
> To unsubscribe from this group, send email to nosql-discussi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nosql-discussion?hl=en.
>
>
>
>


Peter Neubauer

unread,
Dec 10, 2009, 5:43:59 PM12/10/09
to nosql-discussion
Thanks Kingsley,
do you have an overview of your SPARQL enhancements? Would be great to
look at in order to get ideas and input for interoperability and
completeness of problems that need solving!

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.



On Thu, Dec 10, 2009 at 11:37 PM, Kingsley Idehen

Kingsley Idehen

unread,
Dec 10, 2009, 5:53:31 PM12/10/09
to nosql-di...@googlegroups.com
Peter Neubauer wrote:
> Thanks Kingsley,
> do you have an overview of your SPARQL enhancements? Would be great to
> look at in order to get ideas and input for interoperability and
> completeness of problems that need solving!
>

Links:

1. http://spreadsheets.google.com/pub?key=tl2FDWghDKDc3G70xKkNoNg&gid=5

2. http://delicious.com/kidehen/sparql_tutorial

Kingsley

Peter Neubauer

unread,
Dec 11, 2009, 5:00:07 AM12/11/09
to nosql-discussion
Very cool,
thanks Kingsley!

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Relationships count.
http://gremlin.tinkerpop.com - PageRank in 2 lines of code.
http://www.linkedprocess.org - Computing at LinkedData scale.



On Thu, Dec 10, 2009 at 11:53 PM, Kingsley Idehen

Alex Popescu

unread,
Dec 15, 2009, 3:13:13 AM12/15/09
to NOSQL
To me NoSQL is just a very similar tag line to the Salesforce.com "No
Software" badge, marking a new (or more consistent) approach in the
industry. While "No Software" was not referring to the inexistence of
software but rather to a new model of delivering software services, I
do see the tern NoSQL in a similar light: a new model for storage/
querying/etc.

I do read that some are complaining about the wideness of the term.
While not being a storage expert myself, I would say that the whole
storage space has always been wide and it was only the introduction of
the relational model based on Codd's principles that brought some sort
of "unification".

I think what is important for the whole NoSQL space is to focus on
delivering clear descriptions of the space it is operating into
(including constraints, use cases, classifications, etc.). It is also
my intention with MyNoSQL (http://nosql.mypopescu.com) to help as much
as possible in this direction.

./alex

PS: Before MyNoSQL (http://nosql.mypopescu.com), I have published a
couple of other articles trying to bring some light in this space:

- Quick Reference to Alternative data storages:
http://themindstorms.blogspot.com/2009/05/quick-reference-to-alternative-data.html
- Alternative Data Storage Status Quo:
http://jots.mypopescu.com/post/167285950/alternative-data-storage-status-quo
- A Schema-less Relational Database:
http://themindstorms.blogspot.com/2009/06/schema-less-relational-database.html

Seth Johnson

unread,
Dec 19, 2009, 9:45:00 PM12/19/09
to nosql-di...@googlegroups.com

SQL is a convention.  Before SQL, XBASE was a general convention, a BASIC-like, procedural, record-oriented approach to relational modeling.  XBASE fell out of use largely because it wasn't clear how to extend it to do relational modeling over networks.  Among SQL's advantages was the very a significant one that it represented a way to do relational modeling for networked data, albeit through a simple client-server model, setting up a smart daemon with central access to all tables and their relations, which returns cursor sets in response to encapsulated, set-oriented queries.  CAP tradeoffs are becoming an issue as we deal with larger and larger datasets, and some approaches consider that solutions might not be consistent with the conventions of SQL.  One could imagine an approach that sets up an SQL daemon that stores data in a distributed architecture that simulates related physical entity tables in the traditional style, but nobody's writing that app because the considerations related to CAP tradeoffs and different architectures might be best surfaced in the query interface -- whether extending SQL or devising new conventions for how queries should be designed.  Consider, for instance, a combination of Pig Latin and Map/Reduce -- one thing that might happen is we might revisit procedural and record-oriented approaches.


Seth

David Portas

unread,
Dec 20, 2009, 3:54:36 AM12/20/09
to NOSQL
On Dec 20, 2:45 am, Seth Johnson <seth.p.john...@gmail.com> wrote:
> SQL is a convention. Before SQL, XBASE was a general convention, a
> BASIC-like, procedural, record-oriented approach to relational modeling.
> XBASE fell out of use largely because it wasn't clear how to extend it to do
> relational modeling over networks. Among SQL's advantages was the very a
> significant one that it represented a way to do relational modeling for
> networked data, albeit through a simple client-server model, setting up a
> smart daemon with central access to all tables and their relations, which
> returns cursor sets in response to encapsulated, set-oriented queries. CAP
> tradeoffs are becoming an issue as we deal with larger and larger datasets,
> and some approaches consider that solutions might not be consistent with the
> conventions of SQL. One could imagine an approach that sets up an SQL
> daemon that stores data in a distributed architecture that simulates related
> physical entity tables in the traditional style, but nobody's writing that
> app because the considerations related to CAP tradeoffs and different
> architectures might be best surfaced in the query interface -- whether
> extending SQL or devising new conventions for how queries should be
> designed. Consider, for instance, a combination of Pig Latin and Map/Reduce
> -- one thing that might happen is we might revisit procedural and
> record-oriented approaches.
>
> Seth

SQL deficiencies are pretty well known. However, I would suggest that
a (non-SQL) *relational* language is a perfectly good interface for a
CAP-respecting database model. In fact the Relational Model has
characteristics that I think make it as well suited or better suited
than navigational models are.

The fact that the RM doesn't depend on navigational structure makes it
very suitable for late-arriving / eventually-consistent data. In a
hierarchy a node cannot be created before its parent. In RM no such
constraint need apply.

In principle relations are typed variables - subject to inheritence
and type casting like other variables. So at the logical level the
model is very dynamic. (SQL doesn't implement the relational model in
this and many other respects and so it has always imposed severe
limitations that make schemas less dynamic).

RM is agnostic as to physical storage structures, so a structure
suited to the task in hand can always be used. Representation of
hierarchies in RM does not have to imply the join overhead that many
people have come to expect from SQL. If the relation being queried is
actually stored as a pointer-based structure then navigation through
that hierarchy is exactly as efficient as in a navigation-based
database - but without the logical restictions that a navigational
model imposes.

Unfortunately the RM seems to get unjustly maligned because of SQL's
failings. Many people assume that "NoSQL" means "non-relational" and
yet all of the arguments claiming to support that position actually
read like criticisms of the SQL model and not the relational one.

David

Seth Johnson

unread,
Dec 20, 2009, 8:36:55 AM12/20/09
to nosql-di...@googlegroups.com

Well observed.  I tend to generalize the notion of relation and put schemas and data according to that generalization in a uniform pointer-based structure.  Depending on what you mean by "navigation-based" I'm not sure a contrast needs to be drawn between that and whatever approach you're taking.  I think in terms of column-oriented approaches, whereas it sound to me like you're dealing with a less structured key value store, or maybe graph-oriented approach.  I also tend to think that different CAP tradeoffs correlate well with the enterprise architecture notion of a distinction between operational/transactional uses/systems and analytical uses/systems.


Reply all
Reply to author
Forward
0 new messages