Use / business cases

Johannes Ernst

unread,

Nov 1, 2009, 10:21:23 PM11/1/09

to nosql-di...@googlegroups.com

Question to everybody in order to move to a more constructive discussion, hopefully ;-)

When asked, what top-three use or business cases would you list for (any, or all) No SQL technologies?

My take (strawman)

1. amounts of data so massive that a massively distributed architecture is needed (e.g. need for BigTable)

2. query load / complexity too large for joins even with band-aids (e.g. Digg's case)

3. too large gap between schema complexity and flexibility requirements vs. physical data modeling in a relational database (e.g. need for a graph database)

There are a lot more than three, I think, but I wonder how people would prioritize them.

Cheers,

Johannes Ernst.

NetMesh Inc.

InfoGrid.org

David Timothy Strauss

unread,

Nov 1, 2009, 11:33:48 PM11/1/09

to nosql-di...@googlegroups.com

#2 is the main one for me. Consolidating and indexing (via denormalizing) data is complex, error-prone, and inefficient in SQL systems.

Martin Bruse

unread,

Nov 2, 2009, 4:53:48 AM11/2/09

to nosql-di...@googlegroups.com

I can think of a fourth case: too many updates / second on a single table for a single master. Of course you can shard your sql servers, but then you loose so much of the power of sql joins, and queries will have to run across all servers etc etc. That one is probably the killer app for me when it comes to distributed masterless systems.

//Martin

On Nov 2, 2009 5:34 AM, "David Timothy Strauss" <da...@fourkitchens.com> wrote:

#2 is the main one for me. Consolidating and indexing (via denormalizing) data is complex, error-prone, and inefficient in SQL systems.

----- "Johannes Ernst" <jernst+google.com@netmesh.us> wrote: > Question to everybody in order to ...

David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

--~--~---------~--~----~------------~-------~--~----~ You received this message because you are sub...

Jan Lehnardt

unread,

Nov 2, 2009, 6:38:24 AM11/2/09

to nosql-di...@googlegroups.com

Hi Johannes,

I'm adding one that might be specific to CouchDB, but it's a good one
(IMHO) so others might follow suit (some are already, to be fair :)

Easy of use: You shouldn't need a PhD in computer science to store
your data. Sure, a lot of us hacked our way into SQL (myself included)
but it took quite some time to figure out all the kinks, and then
some. Nonrelational databases have the great chance of letting the
developer "just store data" and concentrate on what they are best at:
Developing applications that rock their users' or customers' worlds.

Cheers
Jan
--

eprpa...@gmail.com

unread,

Nov 2, 2009, 6:53:45 AM11/2/09

to nosql-di...@googlegroups.com

I think they are right on....

CHaz

eprpa...@gmail.com

unread,

Nov 2, 2009, 6:55:10 AM11/2/09

to nosql-di...@googlegroups.com

I will agree that is very true. Why do I need a schema, and have the
need to define a table just to store data.

Chaz

Thomas Vial

unread,

Nov 2, 2009, 7:18:47 AM11/2/09

to nosql-di...@googlegroups.com

On Mon, Nov 2, 2009 at 12:55 PM, <eprpa...@gmail.com> wrote:

I will agree that is very true. Why do I need a schema, and have the
need to define a table just to store data.

Chaz

For the same reasons you need classes, just to instantiate objects ;-)

At least changing a relational schema is a well-documented procedure. After 5 years of production, your old records were forcefully refactored to fit into all successive schema changes.
With unstructured storage, my guess is that the *code* needs to know how to map any record (old or new, whatever the contents) to the current core domain.

That's a trade-off and it needs a bit more thinking that "SQL sucks"!

Thomas

Jan Lehnardt

unread,

Nov 2, 2009, 8:28:50 AM11/2/09

to nosql-di...@googlegroups.com

On 2 Nov 2009, at 13:18, Thomas Vial wrote:

>
>
> On Mon, Nov 2, 2009 at 12:55 PM, <eprpa...@gmail.com> wrote:
>
> I will agree that is very true. Why do I need a schema, and have the
> need to define a table just to store data.
>
> Chaz
>
>
> For the same reasons you need classes, just to instantiate objects ;-)

Only in class-based OOP systems (cf http://en.wikipedia.org/wiki/Prototype-based_programming
).

Personally, prototypical inheritance works a lot like my brain and I
have a harder time getting class-based OO right.

I believe it is important to understand that type != structure. A
business card is a business card, whether it lists a fax number or
not. My business card doesn't say `fax = NULL`, yet it is a valid
instance of the type business card just like any business card that
lists a fax number. Coming from a Java & RDBMS background, type and
structure are very much the same. I find it liberating to not have the
two mangled together by my tools.

> At least changing a relational schema is a well-documented
> procedure. After 5 years of production, your old records were
> forcefully refactored to fit into all successive schema changes.

Migration tools don't really care if the storage is fixed or flexible.
A tool can migrate old records to a new schemeless design just as good.

> With unstructured storage, my guess is that the *code* needs to know
> how to map any record (old or new, whatever the contents) to the
> current core domain.

Your views will already have something like:

if(model.faxnumber != NULL) {
// see how bad my code is :D
display_fax_no(model.faxnumber);
}

This doesn't change in when switching to schemaless designs.

--

Thanks for bringing up these points. They force us to think harder how
to present why NoSQL databases can be a good idea.

Cheers
Jan
--

Thomas Vial

unread,

Nov 2, 2009, 9:35:32 AM11/2/09

to nosql-di...@googlegroups.com

Migration tools don't really care if the storage is fixed or flexible.
A tool can migrate old records to a new schemeless design just as good.

Agreed. But if your code requires your data to be at least partially structured to be maintainable, RDBMS are definitely worth being considered. Of course in some cases non-RDBMS may be the best fit.

Your views will already have something like:

if(model.faxnumber != NULL) {
// see how bad my code is :D
display_fax_no(model.faxnumber);
}

This doesn't change in when switching to schemaless designs.

That's a simple case. I bet a change in cardinality (one-to-one -> one-to-many relationship) will be a lot more problematic without data migration!

I'm not saying that non-RDBMS suck, either. I think they bring their own lot of problems, different from RDBMS problems, but I cannot tell what they are without some hindsight.

Thomas

Johannes Ernst

unread,

Nov 2, 2009, 11:36:55 AM11/2/09

to nosql-di...@googlegroups.com

Let me understand correctly what you are saying. Are you saying the
SQL learning curve is too high, or the learning curve of a particular
RDBMS product is too high (to figure out how to more effectively use
it), or ...?

Or are you referring to some kind of impedance mismatch where, say,
your data is JSON and you would have to somehow fit that into tables?

Johannes Ernst

unread,

Nov 2, 2009, 11:44:19 AM11/2/09

to nosql-di...@googlegroups.com

There is the very old (in language design) controversy about strong
vs. weak typing and how beneficial or evil it might be.
http://en.wikipedia.org/wiki/Type_system#Controversy

Personally I think this is an orthogonal discussion: one can use a SQL
database to store uninterpreted blobs, and one can us a strong type
system even with a simple key-value store -- enforcement rules left as
exercise for the programmer given the state of most projects today,
but I'd bet more strongly typed system will show up in response to
market demand from those who like strong type systems.

So personally I don't think that this is a strong driver for NoSQL,
but I can be convinced otherwise. Shoot ... ;-)

Dwight Merriman

unread,

Nov 2, 2009, 11:52:20 AM11/2/09

to nosql-di...@googlegroups.com

Agred that not all NoSQL solutions will be weakly typed, but when it comes into play it feels important to me. With many of these solutions you can manipulate the internals of these weakly typed documents (queries and indexes on secondary/embedded fields) which makes it not the same as key->blob.

The original driver of NoSQL was scale but now that nonrelational has some traction these other aspects will come into play and have some benefits. In particular with weakly typed languages, a weakly typed store can fit elegantly. Also nice with agile methodologies as we have lower (albeit still stome) data migration efforts.

Jan Lehnardt

unread,

Nov 2, 2009, 12:36:27 PM11/2/09

to nosql-di...@googlegroups.com

On 2 Nov 2009, at 17:36, Johannes Ernst wrote:

>
> Let me understand correctly what you are saying. Are you saying the
> SQL learning curve is too high, or the learning curve of a particular
> RDBMS product is too high (to figure out how to more effectively use
> it), or ...?

Learning SQL has a fairly steep learning curve. Specific
implementations add to that. I'm not saying it is "too high"
in general. For what SQL gives you, the learning curve is
just right because the power is great.

But often you don't need all the power of SQL and just want
to persist some data. In that case, a lower curve would be
great :)

> Or are you referring to some kind of impedance mismatch where, say,
> your data is JSON and you would have to somehow fit that into tables?

That's a separate issue. Just by example of comparing bears to oranges,
Rails's ActiveRecord is ~25k lines of Ruby, CouchDB and Riak clock in
at 1/2 and 1/4 LoC in Erlang. While ORMs might work well in certain
or a lot of situations, they are generally heavy.

Cheers
Jan
--

Jan Lehnardt

unread,

Nov 2, 2009, 12:38:57 PM11/2/09

to nosql-di...@googlegroups.com

On 2 Nov 2009, at 17:44, Johannes Ernst wrote:

>
> There is the very old (in language design) controversy about strong
> vs. weak typing and how beneficial or evil it might be.
> http://en.wikipedia.org/wiki/Type_system#Controversy
>
> Personally I think this is an orthogonal discussion: one can use a SQL
> database to store uninterpreted blobs, and one can us a strong type
> system even with a simple key-value store -- enforcement rules left as
> exercise for the programmer given the state of most projects today,
> but I'd bet more strongly typed system will show up in response to
> market demand from those who like strong type systems.
>
> So personally I don't think that this is a strong driver for NoSQL,
> but I can be convinced otherwise. Shoot ... ;-)

The data on the web is messy and in no way fits a schema.

*headshot* :D

Cheers
Jan
--

Jan Lehnardt

unread,

Nov 2, 2009, 1:13:10 PM11/2/09

to nosql-di...@googlegroups.com

On 2 Nov 2009, at 17:52, Dwight Merriman wrote:

> Agred that not all NoSQL solutions will be weakly typed, but when it
> comes into play it feels important to me. With many of these
> solutions you can manipulate the internals of these weakly typed
> documents (queries and indexes on secondary/embedded fields) which
> makes it not the same as key->blob.

Yeah, I think we are collecting attributes of all NoSQL solutions
here, not the ones that fit all NoSQL solutions. By the nature of
NoSQL, that'd be silly, since we are all about diversity. If a certain
NoSQL solution is strongly typed, that's cool!

I was just adding to the use-cases that were posted in the initial mail.

Cheers
Jan
--

> but now that nonrelational has some traction these other aspects
> will come into play and have some benefits. In particular with
> weakly typed languages, a weakly typed store can fit elegantly.
> Also nice with agile methodologies as we have lower (albeit still
> stome) data migration efforts.
>
>

> On Mon, Nov 2, 2009 at 11:44 AM, Johannes Ernst <jernst+g...@netmesh.us

Johannes Ernst

unread,

Nov 2, 2009, 1:12:43 PM11/2/09

to nosql-di...@googlegroups.com

Fair enough. I think my bucket #3 is probably a bit larger than I thought -- including the whole range from very expressive strongly typed models to very weakly typed models. Certainly SQL put the industry into a smaller box there than would have been preferable. Great if this movement could manage to fix that.

Johannes Ernst

unread,

Nov 2, 2009, 1:15:04 PM11/2/09

to nosql-di...@googlegroups.com

On Nov 2, 2009, at 9:38, Jan Lehnardt wrote:

On 2 Nov 2009, at 17:44, Johannes Ernst wrote:

There is the very old (in language design) controversy about strong
vs. weak typing and how beneficial or evil it might be.

http://en.wikipedia.org/wiki/Type_system#Controversy …

The data on the web is messy and in no way fits a schema.

*headshot* :D

I'm not dead yet ;-) Seems like my defense is bucket #3: we want and need a lot more flexibility in the type system, see previous response to Dwight.

David Timothy Strauss

unread,

Nov 2, 2009, 2:54:14 PM11/2/09

to nosql-di...@googlegroups.com

----- "Jan Lehnardt" <j...@apache.org> wrote:

> The data on the web is messy and in no way fits a schema.
>
> *headshot* :D

OK, it's time for someone to put their foot down regarding schema. :-) The absence of schema has nothing to do with any technical advantages mentioned on this thread (or even list).

It just happens that most current nosql systems lack schema support. That's something we should lament, not celebrate. People have been using "schema" like a dirty word, an embodiment of all that is evil about the rigid, flat, and over-simplified tuple layout in many RDBMS tools, but schema is not *that* anymore than data storage is *SQL*.

A schema is simply an ontology, a way to structure data and define what's valid and what's not. Whether or not the data store understands the ontology, the application must. What the data store cannot guarantee, the application must itself validate. Worse, without schema support, the system degrades from structure guarantees about *every* item in the store to the application merely being able to validate *specific* items. Schema enforcement is a gatekeeper of validity, an especially important role when data stores are shared or applications are buggy.

A schema-aware data store provides many benefits to the application(s) using it, and I loathe to think that the nosql world will brand itself as anti-schema. It's OK for us to argue that strong schema support is less important than other data storage innovations being pursued in the nosql world, but we should not pretend its absence is good.

We should focus our efforts on schema models that make application development faster, more pleasant, and more reliable.

More of my schema ranting here:
http://fourkitchens.com/blog/2009/07/05/how-schema-got-bad-name

David Timothy Strauss

unread,

Nov 2, 2009, 3:03:23 PM11/2/09

to nosql-di...@googlegroups.com

----- "Johannes Ernst" <jernst+g...@netmesh.us> wrote:

> Personally I think this is an orthogonal discussion: one can use a SQL
> database to store uninterpreted blobs, and one can us a strong type
> system even with a simple key-value store -- enforcement rules left as
> exercise for the programmer given the state of most projects today,
> but I'd bet more strongly typed system will show up in response to
> market demand from those who like strong type systems.

It's not really orthogonal. Many nosql systems use eventual-consistency models. Presence or absence of a schema (one understood by the data store) directly affects methods available for conflict resolution. Systems like CouchDB need also expect valid JSON with certain values populated in the resultant data structure (for, say, calculating views).

Our capabilities quickly hit a limit when we assume the values for key/value pairs are opaque to the data store.

Esko Luontola

unread,

Nov 2, 2009, 4:12:42 PM11/2/09

to NOSQL

I have one more use case that is different from all the other NoSQL
systems that I've noticed being spoken about here:

Online games and virtual worlds, especially MMO games. What makes this
category special is these characteristics:

- Low latency is of utmost importance (games are soft real-time), even
more important than throughput. As a result, the data being accessed
should be available locally, without need for network access (even the
latency between servers in a LAN should be avoided).
- The data access patterns are typically 50% reads 50% writes.
- It's acceptable to lose some recent work (last couple of seconds) in
case of a server failure, as long as the data stays in a consistent
state. The transaction durability can be relaxed to achieve better
scalability and latency.
- The data access patterns of the players are mostly local, but also
constantly changing as the players move around in the virtual world.
The system should balance the load so, that all players accessing the
same data are located on the same server, in order to minimize
latency.
- Scalability is important, because a popular MMO game might have tens
or hundreds of thousands of concurrent players. Traditionally MMO
games have resorted to sharding in order to scale (for example,
according to one site, in WoW each server can have 10k+ players of
which 3-4k can be online concurrently), but we want to achieve
shardless scalability, so that the players can interact with anybody
they want (EVE Online is one of the few big MMOs where all players are
in the same world, without shards).

I know two systems that are focused on that category (both are open
source). The first one is Project Darkstar (http://
www.projectdarkstar.com/) and is developed by Sun Labs (its
development is at beta stage). The second one is Dimdwarf (http://
dimdwarf.sourceforge.net/) and is developed by me (its development is
at pre-alpha stage). I hinted the Darkstar developers about NoSQL,
that here might be people who would be interested in it. I might also
get more involved later, once my system ready enough to be usable.

Dwight Merriman

unread,

Nov 2, 2009, 5:37:15 PM11/2/09

to nosql-di...@googlegroups.com

wow, that is a good one, I had not considered games.

i would definitely investigate mongodb there, has good write/update performance which seems important there

Kristina Chodorow

unread,

Nov 2, 2009, 5:47:08 PM11/2/09

to nosql-di...@googlegroups.com

In fact, EA is already using MongoDB for caching game data.

Esko Luontola

unread,

Nov 2, 2009, 7:26:14 PM11/2/09

to NOSQL

On Nov 3, 12:37 am, Dwight Merriman <dwi...@10gen.com> wrote:
> i would definitely investigate mongodb there, has good write/update
> performance which seems important there

Thanks for pointing that out.

The interesting stuff in Darkstar happens at a higher level than the
database, but some database (a key-value store) is anyways needed for
persisted data storage. Right now Darkstar uses BerkeleyDB as its
database backend, but that is an implementation detail that can be
changed.

The current plan for Darkstar's multi-node setup is that there is a
central server node with the database, and multiple application nodes
which execute the application logic and to which the clients connect.
After the people at Sun manage get the system to scale near-linearly
as more application nodes are added, then the next step will probably
be to distribute the database on multiple nodes. At that point it will
be worth considering that which database backend to use - maybe
replacing BerkeleyDB with MongoDB or something else will give better
scalability.

Reply all

Reply to author

Forward