----
http://www.springerlink.com/index/2682X0T22T30141J.pdf
Tradeoffs between Parallel Database Systems, Hadoop, and HadoopDB as Platforms for Petabyte-Scale Analysis
Daniel J. Abadi
Yale University
d...@cs.yale.edu
Abstract. As the market demand for analyzing data sets of increas- ing variety and scale continues to explode, the software options for per- forming this analysis are beginning to proliferate. No fewer than a dozen companies have launched in the past few years that sell parallel database products to meet this market demand. At the same time, MapReduce- based options, such as the open source Hadoop framework are becoming increasingly popular, and there have been a plethora of research publi- cations in the past two years that demonstrate how MapReduce can be used to accelerate and scale various data analysis tasks.
Both parallel databases and MapReduce-based options have strengths and weaknesses that a practitioner must be aware of before selecting an analytical data management platform. In this talk, I describe some ex- periences in using these systems, and the advantages and disadvantages of the popular implementations of these systems. I then discuss a hybrid system that we are building at Yale University, called HadoopDB, that attempts to combine the advantages of both types of platforms. Finally, I discuss our experience in using HadoopDB for both traditional decision support workloads (i.e., TPC-H) and also scientific data management (analyzing the Uniprot protein sequence, function, and annotation data).
Keywords: MapReduce, parallel databases, scalable dystems, fault tol- erant systems, analytical data management.
Here's a blog post by the author himself with no strings attached :-)
http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html
This is a *must read* article, by the way, if you want to understand
the trade-offs between SQL and noSQL.
Tom