I think either of these would be excellent. Printed materials go a
long way toward legitimizing a movement; so far as I know, the only
thing in the pipe on this is O'Reilly's CouchDB book.
Adam
* the technical decision making and design that went into these systems
* an architectural look at how to integrate them into systems
* tutorials on their usage and APIs
It would also open doors for more in depth looks at some of the "old"
ideas (i.e., CAP and such) that seem to be getting revitalized by way of
the nosql "movement."
~thomas
+1 on that. At the neo4j project we set up a planet a while back:
This has worked out reasonable well. We could syndicate it directly to
a Planet NoSQL or -- probably better -- only pull in posts tagged with
'nosql.'
I think the point about review process is well made though. How do we
stop a NoSQL blog or planet from turning into a venue for
advertisement for the different projects? Maybe review process is too
heavyweight for a blog or planet but it seems like at least some
guidelines are needed.
Cheers,
-EE
I'm relatively new to NoSQL but been following couchdb (and previously
zodb and other non-SQL dbs) with great interest.
Clearly, this sort of data warehouse situation is the very place where
non-SQL databases could shine due to the non-scalability of SQL
db's. Use a non-SQL db, spread the work over many commodity servers,
use MapReduce style processing to get your answers.
Has anyone written up an example or tutorial on how one would migrate
their existing non-scaling dimensional data warehouse to something
like CouchDB or other non-SQL database? That would make for a very
awesome case study.
I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.
--
Tracy Reed
http://tracyreed.org
I note that http://wiki.apache.org/hadoop/Hive says:
It provides a mechanism to put structure on this data and it also
provides a simple query language called QL which is based on SQL
and which enables users familiar with SQL to query this data. At
the same time, this language also allows traditional map/reduce
programmers to be able to plug in their custom mappers and
reducers to do more sophisticated analysis which may not be
supported by the built in capabilities of the language.
Why only based on SQL and not something they can actually call SQL
even if not fully ANSI SQL? Making provisions for custom mapreducers
also indicates they don't have a whole lot of faith in SQL.
This is not to say that I disagree with your actual premise that SQL
itself is not necessarily the problem.
> the point being that while most implementations of SQL have historically not
> been massively scalable - that is not the fault of SQL the language. it is
> not even so much the fault of the relational model of data (as Hive and GQL
> show). SQL itself does not imply transactional semantics like ACID (which
> is one of things that many the new generation of systems relax).
Duly noted and of course absolutely correct.
I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.
Just my $0.02:
SQL is the enemy - or at least one aspect of it, even if shared-disk,
non-scalable, etc. are the more important aspects.
Conceptually SQL has many shortcomings which make it difficult to use in
modern software development. e.g.:
* large queries generally need to be written as a single, massive,
compound statement
* SQL (or at least as it's currently practiced, using normalized "flat"
database tables) is incompatible with OO development, which results in
the "Object-relational impedance mismatch" that everybody keeps butting
heads against.
* Performing queries on graph-based data is extremely unwieldy in SQL.
etc.
DR
I understand that adding a column to an Oracle/MySQL table is no go, but my questions is more the following
Is the schema freeness of the NoSQL Databases a conscious design decision to have this feature or is it a side effect of the distributed data model?
-Mike
No single word answer, but this is the blurb I wrote when I sent the
links around:
FYI: I think because of all the open government activity, there's been
a gathering of minds in the various HUGE, Distributed database
solutions space.
Got these two links on Twitter. The second is really useful:
No to SQL? Anti-database movement gains steam
http://www.linuxworld.com/news/2009/070209-no-to-sql-anti-database-movement.html?fsrc=rss-linux-news
NOSQL Debrief:
http://blog.oskarsson.nu/2009/06/nosql-debrief.html
The reaction against SQL is basically an expression of a turn to
specialized denormalized and distributed database architectures to
support needs for huge data projects.
But the second link is from a meeting of all the main free software
solutions out there, with explanations of what each is about. Very
useful for getting a picture of what tools are out there and what
their different strengths are.
---
Seth
I usually go with "non-relational." it describes the only common trait
I find between these new db systems: they all have a data model that
differs from Codd's relational model. There are two drawbacks with
that term:
1. While non-relational is a good description of this emerging crop
of databases, it's also a good description of previous generations' db
contenders like OODB and CODASYL/network model.
2. Non-relational is sliiightly too long for to be perfect in this
140-char world we're living in.
Despite that, it's the most accurate word I've found and reasonably
google-able. It clearly trumps "anti-database" (we're all databases
here), "no sql" (see Joydeep's post) and "distributed" (there are many
new databases where scale-out isn't the prime focus).
Then if I want to create a stir (c f nosql's goal of "grabbing
attention") I use "post-relational," which is good and controversial
but gives a lot of google false positives (mainly from psychology).
Cheers,
-EE
Yep, but a *relationship* (not a relation) is a link between two
information entities (i.e. an edge in a graph). I agree that there's a
LOT of that out there, in fact my company's motto is "value in
relationships" because we believe that the *most important* part of a
data set is frequently the relationships between entities. Hence a
graph model.
But the "relational" part in my "non-relational" suggestion refers to
the relational algebra that is the foundation of RDBMS. That model is
typically known as "the relational model" as we all know, and people
I've talked to have interpreted non-relational in that setting.
Cheers,
-EE
Hi Seth,
I actually don't think so. I often talk about "scaling to complexity"
AND "scaling to size." I think both are important issues to address.
As an example of non-relational use cases in non-HUGE settings, I know
the couch folks talk about running on small devices, inside the
browser, etc, and we in the Neo4j crew certainly think that graph
databases have a lot of virtues also for smaller data sets.
Cheers,
-EE
I think nosql is here to stay. It is short, concise and controversial
enough to get people to notice. The mere fact that people object
means we're onto something. I don't think a scientifically correct
title (if that can be found at all) buys us anything.
Long live nosql!
Cheers
Jan
--
Google has been pushing the hardware envelope a bit in recent years in
terms of things like motherboard design, cpu power/heat usage, data
center design/architecture. I'm sure if you search around you can turn
up info about this, as it's been discussed in a number of places in the
tech press in recent years.
That said, they're (obviously) primarily focused on things that are of
direct benefit to their business (i.e., things that will lower the costs
of operating a data center full of servers) as opposed to general
hardware r&d like CPU design and such.
DR