Distributing large data

6 views
Skip to first unread message

Tor

unread,
Jul 29, 2010, 12:50:31 PM7/29/10
to VSCSE Big Data for Science 2010
So, this is in a general theme with Mahidhar's presentation, but I'm
hoping for a bit of discussion so I'm sending it to the list.

I work for AGRE (www.agre.org), and we're essentially a data warehouse
for genetic and phenotypic information relating to autism. We work
with a number of unrelated institutions, mostly in the US but a few
in Europe as well. Our datasets are quickly approaching the terabyte
and beyond range, and we are trying to decide how best to approach
distribution between sites that do not have a shared resource like the
DC WAN or high-bandwidth backbone. Is anyone else in a similar sharing
situation? If so, how are you approaching it? If not, any suggestions?

Cheers,
-Tor

ffoe...@gmail.com

unread,
Jul 29, 2010, 5:00:44 PM7/29/10
to vscse-big-data-...@googlegroups.com
Tor,

I work with similar type of data (I work for Genus Plc.) So we have SNP and phenotype data to store also.
Our needs are very similar, but we have a local cluster we use for our genetic discovery research.

I'm looking into a NoSQL alternative to house genotype data and hopefully, add phenotype and mine the data for all sorts of things.
Our datasets are pretty large too. I've been looking at a few DBs: MongoDB, HadoopDB and Cassandra. So far I like cassandra the most, but still testing it.

Sorry I'm not really answering your question other than to say: we're also looking at this. :)

I'd like to stay in touch, perhaps even work together on developing a solution.

-Fernie

Geoffrey

unread,
Jul 29, 2010, 5:09:27 PM7/29/10
to VSCSE Big Data for Science 2010, Geoffrey Fox
Is http://www.cloudera.com/blog/2010/03/how-raytheon-researchers-are-using-hadoop-to-build-a-scalable-distributed-triple-store/
interesting?
which is cloud table supporting triple store

A slightly related (as uses NOSQL table) technology is Fusion tables
from Google
http://tables.googlelabs.com/Home

ffoe...@gmail.com

unread,
Jul 29, 2010, 5:27:12 PM7/29/10
to vscse-big-data-...@googlegroups.com
I just downloaded cloudera dristro yesterday actually!
Reply all
Reply to author
Forward
0 new messages