I think either of these would be excellent. Printed materials go a
long way toward legitimizing a movement; so far as I know, the only
thing in the pipe on this is O'Reilly's CouchDB book.
Adam
* the technical decision making and design that went into these systems
* an architectural look at how to integrate them into systems
* tutorials on their usage and APIs
It would also open doors for more in depth looks at some of the "old"
ideas (i.e., CAP and such) that seem to be getting revitalized by way of
the nosql "movement."
~thomas
+1 on that. At the neo4j project we set up a planet a while back:
This has worked out reasonable well. We could syndicate it directly to
a Planet NoSQL or -- probably better -- only pull in posts tagged with
'nosql.'
I think the point about review process is well made though. How do we
stop a NoSQL blog or planet from turning into a venue for
advertisement for the different projects? Maybe review process is too
heavyweight for a blog or planet but it seems like at least some
guidelines are needed.
Cheers,
-EE
I'm relatively new to NoSQL but been following couchdb (and previously
zodb and other non-SQL dbs) with great interest.
Clearly, this sort of data warehouse situation is the very place where
non-SQL databases could shine due to the non-scalability of SQL
db's. Use a non-SQL db, spread the work over many commodity servers,
use MapReduce style processing to get your answers.
Has anyone written up an example or tutorial on how one would migrate
their existing non-scaling dimensional data warehouse to something
like CouchDB or other non-SQL database? That would make for a very
awesome case study.
I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.
--
Tracy Reed
http://tracyreed.org
I note that http://wiki.apache.org/hadoop/Hive says:
It provides a mechanism to put structure on this data and it also
provides a simple query language called QL which is based on SQL
and which enables users familiar with SQL to query this data. At
the same time, this language also allows traditional map/reduce
programmers to be able to plug in their custom mappers and
reducers to do more sophisticated analysis which may not be
supported by the built in capabilities of the language.
Why only based on SQL and not something they can actually call SQL
even if not fully ANSI SQL? Making provisions for custom mapreducers
also indicates they don't have a whole lot of faith in SQL.
This is not to say that I disagree with your actual premise that SQL
itself is not necessarily the problem.
> the point being that while most implementations of SQL have historically not
> been massively scalable - that is not the fault of SQL the language. it is
> not even so much the fault of the relational model of data (as Hive and GQL
> show). SQL itself does not imply transactional semantics like ACID (which
> is one of things that many the new generation of systems relax).
Duly noted and of course absolutely correct.
I know people facing this very same SQL based dimensional data
warehouse non-scalability problem. Currently their only option is to
buy very expensive hardware, very expensive software, and invest a lot
of time and money only to push the scalability barrier out a little
further but still not far enough.
Just my $0.02:
SQL is the enemy - or at least one aspect of it, even if shared-disk,
non-scalable, etc. are the more important aspects.
Conceptually SQL has many shortcomings which make it difficult to use in
modern software development. e.g.:
* large queries generally need to be written as a single, massive,
compound statement
* SQL (or at least as it's currently practiced, using normalized "flat"
database tables) is incompatible with OO development, which results in
the "Object-relational impedance mismatch" that everybody keeps butting
heads against.
* Performing queries on graph-based data is extremely unwieldy in SQL.
etc.
DR
I understand that adding a column to an Oracle/MySQL table is no go, but my questions is more the following
Is the schema freeness of the NoSQL Databases a conscious design decision to have this feature or is it a side effect of the distributed data model?
-Mike