NoSQL Journal?

44 views
Skip to first unread message

Martin Streicher

unread,
Jul 3, 2009, 3:28:34 PM7/3/09
to NOSQL
Is there a need to publish regularly on NoSQL topics? A desire?

I was Editor-in-Chief of Linux Magazine from 2002-2007 and am
currently its SW Developer Executive Editor. I can see a couple of
possible avenues, depending on the requirements and goals of this
community: A standalone journal to promulgate the cause and evangelize
the technologies and techniques, or a part of Linux Magazine, where
folks in the community can have an available and established
platform.

I'd be happy to consider and helm either. So, if you have thoughts,
let me know. Between the base technologies and the various language
APIs, there could be a receptive audience to cultivate ad plenty of
advice to share.

Martin

Adam Wiggins

unread,
Jul 3, 2009, 6:05:57 PM7/3/09
to nosql-di...@googlegroups.com
On Fri, Jul 3, 2009 at 12:28 PM, Martin
Streicher<martin.s...@gmail.com> wrote:
> A standalone journal to promulgate the cause and evangelize
> the technologies and techniques, or a part of Linux Magazine, where
> folks in the community can have an available and established
> platform.

I think either of these would be excellent. Printed materials go a
long way toward legitimizing a movement; so far as I know, the only
thing in the pipe on this is O'Reilly's CouchDB book.

Adam

William Newport

unread,
Jul 3, 2009, 6:39:32 PM7/3/09
to nosql-di...@googlegroups.com, NOSQL
Something like highscalability.com would work also, lets save some
trees. A reviewed blog would work very well I think.

Sent from my iPhone

On Jul 3, 2009, at 2:28 PM, Martin Streicher

Thomas Lockney

unread,
Jul 3, 2009, 6:46:02 PM7/3/09
to nosql-di...@googlegroups.com
Well, there are also a few books on Hadoop and it looks like they cover
a bit of the nosql side of that stack. If anything, though, that just
suggests more legitimacy to this idea. I see at least 3 areas that I
would enjoy reading more about:

* the technical decision making and design that went into these systems
* an architectural look at how to integrate them into systems
* tutorials on their usage and APIs

It would also open doors for more in depth looks at some of the "old"
ideas (i.e., CAP and such) that seem to be getting revitalized by way of
the nosql "movement."

Brendan W. McAdams

unread,
Jul 3, 2009, 6:51:42 PM7/3/09
to nosql-di...@googlegroups.com
Agreed. A blog with frequent updates is a much more valuable resource.
It's also a great place to archive all the various videos from
meetings, lectures etc.

A big part of what's needed isn't just here are the tools but here's
how to use them. Use cases, etc would be a great thing to have.

Thomas Lockney

unread,
Jul 3, 2009, 6:59:22 PM7/3/09
to nosql-di...@googlegroups.com
Brendan W. McAdams wrote:
> A big part of what's needed isn't just here are the tools but here's
> how to use them. Use cases, etc would be a great thing to have.
>
Agreed. If you take a look at the new O'Reilly Hadoop book, chapter 14
is all case studies -- reading that was perhaps the most helpful part of
the book for me in understanding how I could make the most effective use
of it. The same thing applies with the posts on highscalability.com --
the posts about how a particular company is using some technology is
always something I end up reading more closely than anything else on there.

~thomas

Johannes Ernst

unread,
Jul 3, 2009, 7:15:30 PM7/3/09
to nosql-di...@googlegroups.com
A wiki instead / in addition perhaps?

Enables more brains to produce more and better content faster ...
Johannes Ernst
NetMesh Inc.

lid.gif
openid.gif

William Newport

unread,
Jul 3, 2009, 7:33:55 PM7/3/09
to nosql-di...@googlegroups.com, nosql-di...@googlegroups.com
Wikis are overrated, I'd just do a reviewed blog.

Sent from my iPhone

On Jul 3, 2009, at 6:15 PM, Johannes Ernst <jernst
> <lid.gif>
>
> <openid.gif>
> http://netmesh.info/jernst
>
>
>

idgaf

unread,
Jul 4, 2009, 3:04:18 AM7/4/09
to NOSQL
I have set up http://nosql.wordpress.com/ in addition to the group I
set up on LinkedIn -- I encourage anyone interested in this community
to register there and I'll give them access to the blog.


On Jul 3, 12:28 pm, Martin Streicher <martin.streic...@gmail.com>
wrote:

Martin Streicher

unread,
Jul 4, 2009, 1:09:35 PM7/4/09
to NOSQL

Linux Magazine is now online-only. Perhaps I should have stated that
in my post. Any journal would also have an online site and a PDF
version. There is no print required. I think the important aspect of
something like a journal is the review process to yield high-quality
content.

On Jul 3, 6:05 pm, Adam Wiggins <a...@heroku.com> wrote:
> On Fri, Jul 3, 2009 at 12:28 PM, Martin
>

David Timothy Strauss

unread,
Jul 4, 2009, 4:15:45 PM7/4/09
to nosql-di...@googlegroups.com
I think it would be better to set up a Planet NoSQL and simply aggregate posts. Posts from my company blog (the primary place where I write) are aggregated at Planet Bazaar and Planet Drupal, based on my tags for the posts. This saves me from logging into multiple systems, learning multiple systems, and keeps my work in one place.
--
David Strauss
| da...@fourkitchens.com
| +1 512 577 5827 [mobile]
Four Kitchens
| http://fourkitchens.com
| +1 512 454 6659 [office]
| +1 512 870 8453 [direct]

Emil Eifrém

unread,
Jul 4, 2009, 4:35:12 PM7/4/09
to nosql-di...@googlegroups.com
On Sat, Jul 4, 2009 at 10:15 PM, David Timothy
Strauss<da...@fourkitchens.com> wrote:
>
> I think it would be better to set up a Planet NoSQL and simply aggregate posts. Posts from my company blog (the primary place where I write) are aggregated at Planet Bazaar and Planet Drupal, based on my tags for the posts. This saves me from logging into multiple systems, learning multiple systems, and keeps my work in one place.

+1 on that. At the neo4j project we set up a planet a while back:

http://planet.neo4j.org

This has worked out reasonable well. We could syndicate it directly to
a Planet NoSQL or -- probably better -- only pull in posts tagged with
'nosql.'

I think the point about review process is well made though. How do we
stop a NoSQL blog or planet from turning into a venue for
advertisement for the different projects? Maybe review process is too
heavyweight for a blog or planet but it seems like at least some
guidelines are needed.

Cheers,

-EE

Jan Lehnardt

unread,
Jul 4, 2009, 4:35:28 PM7/4/09
to nosql-di...@googlegroups.com

On 4 Jul 2009, at 21:15, David Timothy Strauss wrote:

>
> I think it would be better to set up a Planet NoSQL and simply
> aggregate posts. Posts from my company blog (the primary place where
> I write) are aggregated at Planet Bazaar and Planet Drupal, based on
> my tags for the posts. This saves me from logging into multiple
> systems, learning multiple systems, and keeps my work in one place.

I was thinking along the same lines. +1 as we say in the Apache world :)

Cheers
Jan
--

Thomas Lockney

unread,
Jul 4, 2009, 4:39:03 PM7/4/09
to nosql-di...@googlegroups.com
Martin Streicher wrote:
> Linux Magazine is now online-only. Perhaps I should have stated that
> in my post. Any journal would also have an online site and a PDF
> version. There is no print required. I think the important aspect of
> something like a journal is the review process to yield high-quality
> content.
>
I agree and I think that's the point some are missing on the whole nosql
blog idea. The blog/planet/wiki/whatever is great and definitely could
serve as a source for finding things that could be developed further,
but a well-reviewed and edited journal would be a wonderful addition.

David Timothy Strauss

unread,
Jul 4, 2009, 4:46:15 PM7/4/09
to nosql-di...@googlegroups.com
I agree with the importance of having something reviewed and curated. I think a planet *and* a journal would be ideal.

arschles

unread,
Jul 15, 2009, 12:26:04 AM7/15/09
to NOSQL
A lot of organizations that I've heard of need to (or will need to
very soon), but don't want to, ditch MySQL because they feel like
they're stepping into the unknown. I think the "Planet NoSQL" idea
could really go a long way toward bringing so-called "NoSQL
datastores" into the public eye and reduce all of the fears that
people have.

On Jul 4, 1:46 pm, David Timothy Strauss <da...@fourkitchens.com>
wrote:

Thomas Vial

unread,
Jul 15, 2009, 4:00:42 AM7/15/09
to nosql-di...@googlegroups.com
Hi all!

As a consultant interested in receiving information about non-relational DBs (not enough proficient to contribute yet), I like that idea, too.
I work for a small tech consulting firm who specially like new ideas and creative ways of doing things. We strive to see beyond the old and cultural patterns that are used over and over without even being put into question. The RDBMS is one of those, though of course it can be perfectly suitable in many cases.

But there are many points that need to be clarified before NRDBMS can challenge RDBMS in big organizations:
* what kind of projects could benefit from them (adoption vs. risk or small innovative projects vs. strategic ones)
* serious evaluation of products (there are so many annoucements these days!)
* what's inside those products, how they work (servers / APIs / frameworks, storage, modelization patterns over a regular DMBS, ...), I think that behind the "No SQL" slogan there is a wide variety of ideas and techniques; a matrix might not even be meaningful
* what the "DBA of the future" will look like, and more generally, what about the "run" side (that's were the fears are)

In my opinion a blog, though essential in its own name, is not enough regarding these requirements. It's aimed towards developers and hobbyists, but companies (or customers ;-) want more.

What do you think?

Thomas

Seth Johnson

unread,
Jul 15, 2009, 8:06:02 AM7/15/09
to nosql-di...@googlegroups.com
I'm mostly watching for the killer tool that will support huge honking
fact tables.


Seth Johnson

David Timothy Strauss

unread,
Jul 15, 2009, 10:51:24 AM7/15/09
to nosql-di...@googlegroups.com
Are these fact tables for Chuck Norris? There aren't enough molecules in the universe for that kind of storage.

More seriously: what kind of fact tables? It seems like storage is easy but synthesis is hard.

-----Original Message-----
From: Seth Johnson <seth.p....@gmail.com>

Date: Wed, 15 Jul 2009 08:06:02
To: <nosql-di...@googlegroups.com>
Subject: Re: NoSQL Journal?

Seth Johnson

unread,
Jul 15, 2009, 1:20:22 PM7/15/09
to nosql-di...@googlegroups.com
Okay, for star schemas. It's a denormalized relational structure
where you have a central flat table with one record for every fact,
time stamped, plus as many key fields as you need for attributes,
pointing to "dimension" tables only one level out, in general terms no
deeper. You keep it flat like that so you can slice and dice the core
fact table with simple queries using any of the attribute key fields
freely. You're designating attributes of highly granular events. For
instance, a fact table for events associated with email campaigns,
where each row represents one open, click, transaction, etc. event
associated with marketing emails sent out. Then multiple attribute
key values on each event row pointing to dimension tables one level
out for say, email recipient, sender, type of email campaign, specific
email campaign, etc. However many attribute dimensions you can add to
the core fact table tells you how many unique ways to can analyze the
events for trends and behaviors.

Say you've got 1000 clients you do email campaigns for, say the most
active clients send a billion emails a year. Every individual email
generates multiple event rows. The fact table gets huge.

That's the basic idea. You want to be able to slice and dice data
across several years, and you want to be able to add more and more
clients. So honking huge.

(Thenfor fun I have ideas about really special uses of such star
schemas to -- perversely I'm sure it must seem to all who look on --
represent generic entity types that can hold all relations in one
generic architecture, but still stored in a denormalized flat fact
table.)


Seth



On Wed, Jul 15, 2009 at 10:51 AM, David Timothy

David Timothy Strauss

unread,
Jul 15, 2009, 1:30:07 PM7/15/09
to nosql-di...@googlegroups.com
So this is basically the architecture FriendFeed uses?

Tracy Reed

unread,
Jul 15, 2009, 1:45:43 PM7/15/09
to nosql-di...@googlegroups.com
On Wed, Jul 15, 2009 at 01:20:22PM -0400, Seth Johnson spake thusly:

> Okay, for star schemas. It's a denormalized relational structure
> where you have a central flat table with one record for every fact,

I'm relatively new to NoSQL but been following couchdb (and previously
zodb and other non-SQL dbs) with great interest.

Clearly, this sort of data warehouse situation is the very place where
non-SQL databases could shine due to the non-scalability of SQL
db's. Use a non-SQL db, spread the work over many commodity servers,
use MapReduce style processing to get your answers.

Has anyone written up an example or tutorial on how one would migrate
their existing non-scaling dimensional data warehouse to something
like CouchDB or other non-SQL database? That would make for a very
awesome case study.

I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.

--
Tracy Reed
http://tracyreed.org

Joydeep Sarma

unread,
Jul 15, 2009, 2:05:09 PM7/15/09
to nosql-di...@googlegroups.com
not to be an ass - but there seems to be substantial confusion here on a language (sql) versus how and what it's implemented over (distributed hash table, distributed file system, shared-disk versus shared-nothing and so on).

Hive is a data warehouse implemented over Hadoop. It provides sql as a language interface - but uses a distributed file system and map-reduce as storage and execution engine. at facebook it's used to ingest 15TB of fact and dimension data everyday and process thousands of queries. GQL/AppEngine provide SQL like constructs over Google's Big Table - it provides secondary indices and transactions over a DHT.

the point being that while most implementations of SQL have historically not been massively scalable - that is not the fault of SQL the language. it is not even so much the fault of the relational model of data (as Hive and GQL show). SQL itself does not imply transactional semantics  like ACID (which is one of things that many the new generation of systems relax).

on the contrary side - one of the best ways to make systems like Cassandra et al. amenable to a large audience is to wrap them in sql like interfaces, provide transactions (and things like secondary indices using them).

as someone said - know your friends well, know your enemies better. sql is not your enemy - it's your friend. your enemy is the shared-disk, non-scalable, non fault tolerant, non-extensible proprietary commercial DBMS system that costs an arm and a leg and has a business model that does not work in the age of cheap and big data.

Joydeep

Todd Hoff

unread,
Jul 15, 2009, 2:11:14 PM7/15/09
to nosql-di...@googlegroups.com
Well said.

Tracy Reed

unread,
Jul 15, 2009, 2:18:11 PM7/15/09
to nosql-di...@googlegroups.com
On Wed, Jul 15, 2009 at 11:05:09AM -0700, Joydeep Sarma spake thusly:

> Hive is a data warehouse implemented over Hadoop. It provides sql as a
> language interface - but uses a distributed file system and map-reduce as
> storage and execution engine. at facebook it's used to ingest 15TB of fact

I note that http://wiki.apache.org/hadoop/Hive says:

It provides a mechanism to put structure on this data and it also
provides a simple query language called QL which is based on SQL
and which enables users familiar with SQL to query this data. At
the same time, this language also allows traditional map/reduce
programmers to be able to plug in their custom mappers and
reducers to do more sophisticated analysis which may not be
supported by the built in capabilities of the language.

Why only based on SQL and not something they can actually call SQL
even if not fully ANSI SQL? Making provisions for custom mapreducers
also indicates they don't have a whole lot of faith in SQL.

This is not to say that I disagree with your actual premise that SQL
itself is not necessarily the problem.

> the point being that while most implementations of SQL have historically not
> been massively scalable - that is not the fault of SQL the language. it is
> not even so much the fault of the relational model of data (as Hive and GQL
> show). SQL itself does not imply transactional semantics like ACID (which
> is one of things that many the new generation of systems relax).

Duly noted and of course absolutely correct.

Thomas Vial

unread,
Jul 15, 2009, 2:18:38 PM7/15/09
to nosql-di...@googlegroups.com

I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.


I think that what OLAP cubes are more appropriate! They rely on optimized storage mechanisms and a big amount of RAM, to store pre-calculated aggregates. Querying a cube is just a matter of fetching the right aggregates at the deepest level available, and combining them together to get exactly the projection you asked for.

SQL is very awkward with hierarchical dimensions (think trees), that's why it has a multidimensional fellow: MDX.

Thomas

Ted Cui

unread,
Jul 15, 2009, 2:29:01 PM7/15/09
to nosql-di...@googlegroups.com
Excellent, Seth explained it very well. I want add a little more.

"start schema" is SQL styled design concept. Basically, you cannot not
easily add/remove attributes without high cost. Especially, when the
attributes are not commonly shared by most of objects.

In NoSQL design, you should think schema-free or flexible -schema. Of
course, retrieving information from NoSQL storage will also be
different. It may not as convenient as type a SQL statement. It requires
scripting and extra facilities to post processing the data. Because,
most NoSQL products do not support "group by", "order by" by themselves.

Since most NoSQL are multi-node distributed architecture, for "data
warehouse" style jobs, you typically need Hadoop cluster or things a
like in front of it.

I think google BigTable paper provides a lot of insight information,

http://labs.google.com/papers/bigtable.html

In summary, in NoSQL, data are stored as

- big, usually meaningful key
- flexible-schema attributes describing the key
- multi-version for attributes to track the time dimension.


CouchDB's javascript defined "view" is also a very interesting idea:

http://couchdb.apache.org/docs/overview.html


My 2 cents,

-Ted

David Rosenstrauch

unread,
Jul 15, 2009, 3:09:02 PM7/15/09
to nosql-di...@googlegroups.com
Joydeep Sarma wrote:
> as someone said - know your friends well, know your enemies better. sql is
> not your enemy - it's your friend. your enemy is the shared-disk,
> non-scalable, non fault tolerant, non-extensible proprietary commercial DBMS
> system that costs an arm and a leg and has a business model that does not
> work in the age of cheap and big data.

Just my $0.02:

SQL is the enemy - or at least one aspect of it, even if shared-disk,
non-scalable, etc. are the more important aspects.

Conceptually SQL has many shortcomings which make it difficult to use in
modern software development. e.g.:

* large queries generally need to be written as a single, massive,
compound statement

* SQL (or at least as it's currently practiced, using normalized "flat"
database tables) is incompatible with OO development, which results in
the "Object-relational impedance mismatch" that everybody keeps butting
heads against.

* Performing queries on graph-based data is extremely unwieldy in SQL.

etc.

DR

Puneet Lakhina

unread,
Jul 15, 2009, 3:36:54 PM7/15/09
to nosql-di...@googlegroups.com
Hi,

Im new to the NoSQL movement so pardon me if my questions are dumb.
How important is the schema less part of NoSQL databases? Is the draw for the NoSQL databases more to do with their distributed key value store (or some other mechanism of dividing up the data) based implementations that makes them more scalable for read heavy uses? Or does the schema free part add value? I guess what Im confused about is whether the schema less thing is a side effect or a feature?
--
Regards,
Puneet

Flinn Mueller

unread,
Jul 15, 2009, 3:41:04 PM7/15/09
to nosql-di...@googlegroups.com
Its a feature.  Try adding a column to an Oracle table in production... it leads to a full table lock, a big no go in a heavy traffic situation.

David Timothy Strauss

unread,
Jul 15, 2009, 3:45:44 PM7/15/09
to nosql-di...@googlegroups.com

Puneet Lakhina

unread,
Jul 15, 2009, 3:46:45 PM7/15/09
to nosql-di...@googlegroups.com
I understand that adding a column to an Oracle/MySQL table is no go, but my questions is more the following

Is the schema freeness of the NoSQL Databases a conscious design decision to have this feature or is it a side effect of the distributed data model?
--
Regards,
Puneet

David Timothy Strauss

unread,
Jul 15, 2009, 3:53:14 PM7/15/09
to nosql-di...@googlegroups.com
I wouldn't say "schema freeness" is a feature or a requirement of the distributed model.

Flinn Mueller

unread,
Jul 15, 2009, 3:56:01 PM7/15/09
to nosql-di...@googlegroups.com
On Jul 15, 2009, at 3:46 PM, Puneet Lakhina wrote:

I understand that adding a column to an Oracle/MySQL table is no go, but my questions is more the following

Is the schema freeness of the NoSQL Databases a conscious design decision to have this feature or is it a side effect of the distributed data model?


Depends on the DB.  Tokyo Cabinet for example implements two types a pure key/value store and a table store.  The table store allows schema free interaction which is a design decision (afaik).

If you can get away without defining a schema at the object storage level then why impose that limitation without significant trade off in performance or convenience?

Utkarsh Srivastava

unread,
Jul 15, 2009, 4:15:15 PM7/15/09
to nosql-di...@googlegroups.com
Hi,

Many of us always thought nosql was a misnomer, and the confusion is now beginning to show up. 

FWIW, here's my attempt to resolve the confusion on this thread. There are inherently 2 very different kinds of data stores (think OLTP vs OLAP, some posts on this thread were confusing the two):

a. Those used for online, live serving, e.g., Cassandra, HBase, Voldermont etc. 
b. Those used for offline data analytics, e.g., Hadoop, Hive, Pig

There are 2 distinct questions that the community is trying to answer right now (some posts are mixing these up):

Q1. whats the right architecture ? (RDBMSs have usually been shared-disk)
Q2. whats the right API and data model? (RDBMSs use SQL and fixed-schema, flat tables)

The NoSQL summit was mainly about answering Q1 for OLTP-like systems--- replacing the unscalable traditional RDBMS design by a shared-nothing, fault-tolerant design. As regards Q2, the new systems basically offer the following:
- flexible schemas, sometimes non-flat data models
- explicit get/set/delete/scan APIs as opposed to declarative SQL. 

For OLAP-like systems, Hadoop is widely agreed on to be the right architecture. As regards data model and API for such systems, the jury is still out. The strengths of SQL are its familiarity to developers, and the rich set of tools around it. At the same time, as pointed out by David, SQL is not always the best fit for non-flat data, or large, complicated queries. 

Consequently, you will find 2 languages over hadoop:
- Hive (similar to SQL)
- PigLatin (more procedural language, flexible, not-necessarily-flat data model)

Its possible that one of the above is the right answer, or its possible that there is room for both languages (and more). As a shameless plug, here is a paper we wrote contrasting PigLatin to SQL http://research.yahoo.com/node/2200

Utkarsh


All the talks in the MySQL summit were about systems in the first category, i.e, online serving

Joydeep Sarma

unread,
Jul 15, 2009, 5:17:15 PM7/15/09
to nosql-di...@googlegroups.com
lots of good comments here. since i was one of the people who started the Hive project - i can address some of the points (and acknowledge some others):

- schema evolution (adding columns is hard): Hive does not enforce schema. A schema (as implemented in Hive) is just a way of interpreting some underlying data. As a result - the data and schema can evolve (currently one can add columns and older data that does not have the newly declared columns return nulls). More work is required in this direction (currently Hive always uses the latest schema and not the schema that's most relevant to the partitions being queried).

- how to handle object data: Hive's object model is similar to Java object model. Currently we run Hive queries over complex object types defined and serialized using Thrift. Hive's expression syntax supports dotted notation, array and map indexing. In fact one of the design goals of Hive was to support objects. Granted we are not there fully yet. Writing interesting new serdes (the things that make blobs look like objects to hive) is a pain and it's hard/impossible to assemble objects in Hive queries.

- why does Hive support map/reduce if sql is so great: the ability to write scripts in the language of your choice and to be able to plug that into the dataflow at any point is absolutely fabulous. Hence it's an integral part of Hive. I have been told that this is not that novel (Oracle has had external callouts and table functions apparently for a while) - but the Hadoop's ability to run these things (and be able to recover from failures and such) is probably much better.

- sql gets complicated after a while: this is certainly true. but views make life a lot better - no more nasty complicated sql. we are going to have views in hive at some point.

- sql not good for graph analytics: +1 on that. we need choice in languages - but sql gets a lot done for the problems that are appropriate to it. in fact hadoop as a platform itself is not great for graph analytics (from personal experience).

- why is Hive not ANSI sql: cause FB does not make money selling database software and we built what we needed to get by (with a very limited set of resources). it's open source - trust community and time to bridge the gap.

--

schema-freeness: it's nice to have schemas even when the base abstraction is a blob. One of the things i really liked about Tokyo Cabinet was the it's integration with Lua and the ability to insert application logic in the data path. this should mean (without looking at the code) - that one should be able to treat TC values as structured data and write projection, filters etc. as Lua extensions within TC. One can have secondary index keys whose values point to primary ones. in short - it doesn't seem implausible at all to be able to implement a good subset of SQL (hopefully an OO one) as a layer over TC (or if u don't like SQL - whatever suits your fancy).

i guess what i am trying to say is that a well architected/layered framwework can easily allow different language bindings (like Hadoop has shown with Hive and Pig) - and then one doesn't get into this sql/nosql argument. People are free to choose the right abstraction for their application. i don't see why this should not be generally true for DHT implementations as well (as Google has already shown with GQL)

imho - it would be great to have a GQL kind of language (and an accompanying transactional layer) implemented in open source (which can use different key-value stores already out there). It would go a long way in furthering the cause of these infrastructure beyond it's limited audience today.

Michael Miller

unread,
Jul 15, 2009, 5:43:30 PM7/15/09
to nosql-di...@googlegroups.com
I absolutely agree with this. One of the nice examples from Microsoft
success is LINQ, which they have been able to leverage internally. I
haven't used it much myself but they are able to define the basic
verbs (where, select, etc) and provide a common query language that
can be mapped to drastically different storage engines (sqlserver,
dryad, etc).

-Mike

Kunthar

unread,
Jul 15, 2009, 5:48:13 PM7/15/09
to nosql-di...@googlegroups.com
I am sometimes really wondering, why we have to solve the problem with
not-fit-well tools?
As far as i see, the main bottleneck is on the hardware side. Because
disk i/o is evil.
Google bypass the problems by using "solve it in the real world" way.
But should we do this everytime?
We as human beings, have billion dollar companies running huge
operations around the world and
there is no development in hardware arena as fast as wanted.
I am really asking myself, why Yahoo, Google and others do not start
some hardware r&d programmes?
Imagine new disks, a new bus system, running on lightspeed. What about
quantum computing? We have Linux kernel, we have bunch of hardware
drivers, we have talented engineers all around the world. But we!
can't come together for mankind.
Watching over the boring Intel's new quarterly innovation bullsh*t,
AMD's responses too, stucking to memory limits, mutexes, shared memory
holes.
Sorry this is kinda off topic but i wanted to explain my feelings.

This is an old idiom from Turkey, "if your bed is not fit to your
length, extend it..."

Peace
\|/ Kunthar

eprpa...@gmail.com

unread,
Jul 15, 2009, 5:57:53 PM7/15/09
to nosql-di...@googlegroups.com
If you want the "first" NOSQL like approach go back to the work on
Associative Memory fron the early 60's. There are a few of us around
that worked with these systems. If you look you will see Firestone (yes,
the tire company) actually had some of the coolest hardware out there.

One of the Prof. from U of Texas at Austin had even applied it, in the
80's to work in RAM chips. I tried to actually get the patents but
lawyers who owned the IP got in the way. By now the patents have expired!

What I think would be cool is to start a group that would take ideas,
reduce them down and put them into the OpenCORE project. The idea would
be to take current RAM size chips, add in associative memory and be able
to search GBs of data in mere microseconds (in fact some analysis I did
at the time indicated that you could search disk drives at data
transfer speeds!).

If any of you are interested let me know. We should take the
conversation off line since I don't really think this group is for that
(but I could be wrong).

Chance

Todd Hoff

unread,
Jul 15, 2009, 6:01:47 PM7/15/09
to nosql-di...@googlegroups.com
I've thought the same thing Chance and have had discussions with
friends about using true associative memory. But it came down to
commodity hardware + smart algorithms would probably always win in the
end, so it's not worth the investment.

eprpa...@gmail.com

unread,
Jul 15, 2009, 6:15:01 PM7/15/09
to nosql-di...@googlegroups.com
Todd,

While I can believe that searching a 1TB disk, even at 300Mb/s, would
take 8 seconds to find anything on it (even with compound search
patterns), the only thing you prove is that you need to use smart
algorithms. But I would venture to say that using associative memory
doesn't preclude using smart algorithms. Well perhaps it would just let
the mediocre programmers look like "stars".

But I am really curious as to your proof that commodity hardware +
smart algorithms would beat associative memory and smart algorithms. Can
you elaborate?

Chance

Todd Hoff

unread,
Jul 15, 2009, 6:43:41 PM7/15/09
to nosql-di...@googlegroups.com
I'm not saying I necessarily bought the argument Chance. Not being the
hardware guy it wasn't really my call. I think the issue is building a
large enough cascading CAM requires network fabric that won't scale,
is expensive, quickly outdated, and risky enough it won't be
attractive to all but the high-end. Do you think differently?

Kunthar

unread,
Jul 15, 2009, 7:13:31 PM7/15/09
to nosql-di...@googlegroups.com
People paying around 400K bucks to vertica and others (including
bzipping your data).
And they have their own SAN luns paid already around ~$40k-$200k to
protect, run faster their calcs.
We have to go with DAS and commodity because we have no other selectable option.
If we could see that one ram array is quite cheap as commodity DL380
around ~$2k-3k things would start to change.
We haven't got any idea about ECC rams before but we use them
desperately for now :)
This is technology, we could use it if it is effective enough.
My 2cents.

\|/ Kunth

Johan Oskarsson

unread,
Jul 16, 2009, 6:44:28 AM7/16/09
to nosql-di...@googlegroups.com
Great post Joydeep.

Agreed that the name "nosql" is a bit unfortunate, it was a
tongue-in-cheek name of the meetup from a month ago. The name has sadly
taken on it's own life and turned the discussion in an unfortunate
direction at times.

It seemed appropriate since it's fairly easy to say, remember and to
google for as well as search for at twitter etc. It also helped that it
was a bit "out with the old, in with the new" to grab people's
attention, that part worked quite well.

I find it hard to summarize projects like dynamo and bigtable into a
single, "unique" word that explains it well without getting confused
with other things. If anyone wants to take a stab at it, feel free.

/Johan

Seth Johnson

unread,
Jul 16, 2009, 7:15:09 AM7/16/09
to nosql-di...@googlegroups.com
On Thu, Jul 16, 2009 at 6:44 AM, Johan Oskarsson<jo...@oskarsson.nu> wrote:
>
> Great post Joydeep.
>
> Agreed that the name "nosql" is a bit unfortunate, it was a
> tongue-in-cheek name of the meetup from a month ago. The name has sadly
> taken on it's own life and turned the discussion in an unfortunate
> direction at times.
>
> It seemed appropriate since it's fairly easy to say, remember and to
> google for as well as search for at twitter etc. It also helped that it
> was a bit "out with the old, in with the new" to grab people's
> attention, that part worked quite well.
>
> I find it hard to summarize projects like dynamo and bigtable into a
> single, "unique" word that explains it well without getting confused
> with other things. If anyone wants to take a stab at it, feel free.


No single word answer, but this is the blurb I wrote when I sent the
links around:

FYI: I think because of all the open government activity, there's been
a gathering of minds in the various HUGE, Distributed database
solutions space.

Got these two links on Twitter. The second is really useful:

No to SQL? Anti-database movement gains steam
http://www.linuxworld.com/news/2009/070209-no-to-sql-anti-database-movement.html?fsrc=rss-linux-news

NOSQL Debrief:
http://blog.oskarsson.nu/2009/06/nosql-debrief.html

The reaction against SQL is basically an expression of a turn to
specialized denormalized and distributed database architectures to
support needs for huge data projects.

But the second link is from a meeting of all the main free software
solutions out there, with explanations of what each is about. Very
useful for getting a picture of what tools are out there and what
their different strengths are.

---


Seth

Emil Eifrém

unread,
Jul 16, 2009, 8:03:08 AM7/16/09
to nosql-di...@googlegroups.com
On Thu, Jul 16, 2009 at 12:44 PM, Johan Oskarsson<jo...@oskarsson.nu> wrote:
> Agreed that the name "nosql" is a bit unfortunate,
[snip]

> I find it hard to summarize projects like dynamo and bigtable into a
> single, "unique" word that explains it well without getting confused
> with other things. If anyone wants to take a stab at it, feel free.

I usually go with "non-relational." it describes the only common trait
I find between these new db systems: they all have a data model that
differs from Codd's relational model. There are two drawbacks with
that term:

1. While non-relational is a good description of this emerging crop
of databases, it's also a good description of previous generations' db
contenders like OODB and CODASYL/network model.

2. Non-relational is sliiightly too long for to be perfect in this
140-char world we're living in.

Despite that, it's the most accurate word I've found and reasonably
google-able. It clearly trumps "anti-database" (we're all databases
here), "no sql" (see Joydeep's post) and "distributed" (there are many
new databases where scale-out isn't the prime focus).

Then if I want to create a stir (c f nosql's goal of "grabbing
attention") I use "post-relational," which is good and controversial
but gives a lot of google false positives (mainly from psychology).

Cheers,

-EE

Seth Johnson

unread,
Jul 16, 2009, 8:08:38 AM7/16/09
to nosql-di...@googlegroups.com
You mean this isn't the XBASE google group?

:-P


Seth

eprpa...@gmail.com

unread,
Jul 16, 2009, 8:08:47 AM7/16/09
to nosql-di...@googlegroups.com
Well there are still relationships in the data. Instead I used
"feaderated" as the operational word. Relational makes one think of
traditional DBs whereas "Federated data" tends to leave them in
awe...and wondering what it is.

Chance

Seth Johnson

unread,
Jul 16, 2009, 8:12:52 AM7/16/09
to nosql-di...@googlegroups.com
How about HUGE DBs?

Isn't that one common denominator here?


Seth

Thomas Vial

unread,
Jul 16, 2009, 8:17:18 AM7/16/09
to nosql-di...@googlegroups.com
"Relations" is just the way they called tables in the first relational systems.
The term is still around, and it doesn't refer to links between tables, e.g. foreign keys.

eprpa...@gmail.com

unread,
Jul 16, 2009, 8:18:30 AM7/16/09
to nosql-di...@googlegroups.com
Not necessarily. That's the place that felt the pain first, but isn't
the only place. My project is geared to building "federated" databases
for replicated, fault tolerant environments and a lot of what is
discussed here will impact that work. But I doubt it will be for "huge DBs".

Just my opinion, your mileage will probably vary.
Chance

Emil Eifrém

unread,
Jul 16, 2009, 8:20:44 AM7/16/09
to nosql-di...@googlegroups.com
On Thu, Jul 16, 2009 at 2:08 PM, <eprpa...@gmail.com> wrote:
> Well there are still relationships in the data.

Yep, but a *relationship* (not a relation) is a link between two
information entities (i.e. an edge in a graph). I agree that there's a
LOT of that out there, in fact my company's motto is "value in
relationships" because we believe that the *most important* part of a
data set is frequently the relationships between entities. Hence a
graph model.

But the "relational" part in my "non-relational" suggestion refers to
the relational algebra that is the foundation of RDBMS. That model is
typically known as "the relational model" as we all know, and people
I've talked to have interpreted non-relational in that setting.

Cheers,

-EE

eprpa...@gmail.com

unread,
Jul 16, 2009, 8:21:26 AM7/16/09
to nosql-di...@googlegroups.com
It has nothing to do with foreign keys, since that is an implementation
issue. But the data will have a relationship - between rows in a column
and other rows in other columns. If there isn't any obvious relationship
it would be unstructured data.

As I said I like "federation" since it implies a more or less loose
relationship. And usually spread around.

Chance

Emil Eifrém

unread,
Jul 16, 2009, 8:24:21 AM7/16/09
to nosql-di...@googlegroups.com
On Thu, Jul 16, 2009 at 2:12 PM, Seth Johnson<seth.p....@gmail.com> wrote:
> How about HUGE DBs?
>
> Isn't that one common denominator here?

Hi Seth,

I actually don't think so. I often talk about "scaling to complexity"
AND "scaling to size." I think both are important issues to address.
As an example of non-relational use cases in non-HUGE settings, I know
the couch folks talk about running on small devices, inside the
browser, etc, and we in the Neo4j crew certainly think that graph
databases have a lot of virtues also for smaller data sets.

Cheers,

-EE

Seth Johnson

unread,
Jul 16, 2009, 8:27:00 AM7/16/09
to nosql-di...@googlegroups.com
Seems to me that federation for fault tolerance is a more particular
area of concern, whereas size issues as an inspiration for federation
still describes the fact that what's motivating people is new demands
for honking huge data processing. :-)

Isn't this confluence of different approaches to post-Codd DBs
inspired by open government initiatives and social media -- and now
everybody wants to set up their own stores and perhaps integrate them?


Seth

Seth Johnson

unread,
Jul 16, 2009, 8:29:06 AM7/16/09
to nosql-di...@googlegroups.com
But wouldn't it be true to say that DB technologies addressing more
particular technical concerns would have remained in largely separate
projects, except that we're in the middle of a big shift because of
open government and social media stuff?


Seth

Seth Johnson

unread,
Jul 16, 2009, 8:32:19 AM7/16/09
to nosql-di...@googlegroups.com
Seems to me that federation for fault tolerance is a more particular
area of concern, whereas size issues as an inspiration for federation
still describes the fact that what's motivating people is new demands
for honking huge data processing. :-)

Isn't this convergence/gathering of different approaches to post-Codd
DBs inspired by open government initiatives and social media -- and
now everybody wants to set up their own stores and perhaps integrate
them?


Seth

On Thu, Jul 16, 2009 at 8:21 AM, <eprpa...@gmail.com> wrote:
>

Jan Lehnardt

unread,
Jul 16, 2009, 9:01:25 AM7/16/09
to nosql-di...@googlegroups.com

I think nosql is here to stay. It is short, concise and controversial
enough to get people to notice. The mere fact that people object
means we're onto something. I don't think a scientifically correct
title (if that can be found at all) buys us anything.

Long live nosql!

Cheers
Jan
--


Seth Johnson

unread,
Jul 16, 2009, 10:49:37 AM7/16/09
to nosql-di...@googlegroups.com
I would endorse that! :-)


Seth

eprpa...@gmail.com

unread,
Jul 16, 2009, 11:01:15 AM7/16/09
to nosql-di...@googlegroups.com
What is needed is the "elevator pitch" - the 15 or 20 second statement
which identifies what it is all about. Attaching a single world to it
reminds me of "Supercalifragilisticexpialidocious" - just a word. Though
I would rather have something "more correct" then NOSQL, which this is
not about.

Chance

Jan Lehnardt

unread,
Jul 16, 2009, 11:36:39 AM7/16/09
to nosql-di...@googlegroups.com

On 16 Jul 2009, at 17:01, eprpa...@gmail.com wrote:

>
> What is needed is the "elevator pitch" - the 15 or 20 second statement
> which identifies what it is all about. Attaching a single world to it
> reminds me of "Supercalifragilisticexpialidocious" - just a word.
> Though
> I would rather have something "more correct" then NOSQL, which this is
> not about.

“SQL is what gets taught in universities, NOSQL is what people actually
use.” :)

Cheers
Jan
--

eprpa...@gmail.com

unread,
Jul 16, 2009, 11:51:26 AM7/16/09
to nosql-di...@googlegroups.com
What? Sorry I use both. What does this have to do with what I wrote?

Chance

Jan Lehnardt

unread,
Jul 16, 2009, 12:12:23 PM7/16/09
to nosql-di...@googlegroups.com
Oh sorry, #quotingfail, this was supposed to be a funny attempt
at the elevator pitch.

Cheers
Jan
--

David Rosenstrauch

unread,
Jul 16, 2009, 4:57:12 PM7/16/09
to nosql-di...@googlegroups.com
Kunthar wrote:
> there is no development in hardware arena as fast as wanted.
> I am really asking myself, why Yahoo, Google and others do not start
> some hardware r&d programmes?
> Imagine new disks, a new bus system, running on lightspeed.

Google has been pushing the hardware envelope a bit in recent years in
terms of things like motherboard design, cpu power/heat usage, data
center design/architecture. I'm sure if you search around you can turn
up info about this, as it's been discussed in a number of places in the
tech press in recent years.

That said, they're (obviously) primarily focused on things that are of
direct benefit to their business (i.e., things that will lower the costs
of operating a data center full of servers) as opposed to general
hardware r&d like CPU design and such.

DR

Kunthar

unread,
Jul 16, 2009, 7:01:41 PM7/16/09