Stability as a permanent datastore

30 views
Skip to first unread message

Jeremy Jongsma

unread,
Dec 29, 2011, 5:13:31 PM12/29/11
to terrastore-discussions
I'm looking at using Terrastore/ElasticSearch as a datastore for a
realtime news aggregation server. Everything I've read so far here
looks like it should be good for my needs, but I keep running across
this slide and it worries me:

http://nosql.mypopescu.com/post/1403577624/terrastore-sweet-spot

Does recommending it for "throw-away data" mean that it is not ideal
as a permanent datastore for some reason? Are there any drawbacks to
using Terrastore as a database replacement in a production environment
(versus, say, MongoDB)? Right now we are storing > 16GB of news
stories, and plan on growing that significantly.

I also have a question on cluster configuration: our current
configuration has one master server that feeds the datastore, and
multiple slaves that are query-only. We have two physical locations,
which have high latency between them. Is there a way to configuration
two independent clusters fed by the same master? (i.e., each physical
location's cluster should have a full set of data to avoid an
expensive trip to the other location.)

Sergio Bossa

unread,
Jan 12, 2012, 12:29:09 PM1/12/12
to terrastore-...@googlegroups.com
Hi Jeremy,

sorry for this late response, your email somewhat fell off my radar.

Does recommending it for "throw-away data" mean that it is not ideal
as a permanent datastore for some reason?  

Absolutely not, as it is just related to two Terrastore characteristics:

1) Terrastore servers store everything in memory, so the most accessed data (hot spots) has to fit into the heap to keep good performance.
2) All writes have to go through the Terrastore master, so they must be scaled by deploying a Terrastore ensemble, which is currently not elastic (that is you can't dynamically add/remove clusters).

That doesn't mean you cannot use Terrastore as a general purpose database, provided your use case fits (or overcomes) the characteristics above. 
 
Right now we are storing > 16GB of news
stories, and plan on growing that significantly.

What do you mean by "growing significantly"?
We have a single-cluster Terrastore production deployment storing from 10 to 20 GBs of data.
If you have more data, you can go with a Terrastore multi-cluster ensemble: but as of now, you have to carefully plan the ensemble capacity, because the ensemble size cannot be changed, unless you backup all data and recover it on a larger ensemble.

In other words, if you have a continuously growing data-set, and you want to keep everything into the "same" database, all the time, forever, Terrastore may not be the right choice.
 
I also have a question on cluster configuration: our current
configuration has one master server that feeds the datastore, and
multiple slaves that are query-only.  We have two physical locations,
which have high latency between them.  Is there a way to configuration
two independent clusters fed by the same master?  (i.e., each physical
location's cluster should have a full set of data to avoid an
expensive trip to the other location.)

Yep, use Terrastore event bus infrastructure to replicate data over different locations: http://code.google.com/p/terrastore/wiki/Developers_Guide#Events

Hope that helps, feel free to get back with more questions, I'll hopefully come with a faster answer :)
Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Reply all
Reply to author
Forward
0 new messages