thanks for sharing your experience, I'm glad you're liking Terrastore
and going into production with it :)
We're also using Terrastore in production here, so I'll try to share
some of my knowledge below.
> I am prepared to take a gamble with running Terrastore in a production
> environment. I feel I have taken adequate precaution by implementing
> connectivity failure detection, a caching layer, an automated backup
> agent, and some alerting mechanisms (I won't be able to keep a
> watchful eye on the system at all hours of the day).
That sounds good and necessary.
I've found performance monitoring very important too: that's because
Terrastore is memory-based, and performance of memory-based systems
tends to degrade under one of the following circumstances:
1) Large memory blobs are faulted in, either from disk or network.
2) Memory saturates and full garbage collection kicks in.
So first, always monitor your memory and garbage collector logs.
Also, we set up several metrics to track execution times of most
expensive Terrastore operations (i.e., range queries and map reduce
processing), so to be alerted when performance degrades: we're using
Nimrod for that (see https://github.com/sbtourist/nimrod).
Talking about memory expensive operations, the most critical ones
surely are those which deal with a huge number of keys: i.e., range
queries over large buckets.
That's because the master needs to send the whole key set to the
server, which in turn needs to materialize it in its heap: so, large
buckets are costly, and if you have them, and have range queries over
them, you're strongly suggested to do some monitoring.
> Most notably, I am very nervous about the apparent uncontrolled growth
> of the objectdb. Dropping and recreating Terrastore buckets routinely
> during development has lead to a great deal of presumably
> unnecessary .jdb files in the tc-data/objectdb directory with date
> stamps reaching all the way back to the start of my project.
Apart from the number of jdb files, doesn't the db size decrease when
you delete buckets and documents?
> I also have not yet found the time to exercise the Terracotta
> clustering to observe its real-world behavior when the master goes
> offline, the slave takes over, and then the master is restored. By
> contrast, in development, I have routinely started and stopped
> multiple Terrastore servers so I know that those join and exit their
> cluster fairly smoothly.
The Terracotta master failover works pretty well ... unless you have a
huge db :)
The main problem with huge databases is active -> passive
synchronization when the passive gets back online after a failure:
we've found that if the database gets over 10 gigabytes,
synchronization tends to require lots of memory and lasts several
hours; there are a few tricks to tune the synchronization process, but
they're fairly specific to the production setup, so get back later
with more details in case you need more info.
Other than that, don't forget the golden rule: if you have
active/passive masters, and for some reason all get offline, always
start the *latest* active first (which may obviously be different from
the one you started), and then all passives.
> I expect my site to see very little traffic but nevertheless I want it
> to be as solid as possible within reason. I plan to deploy a master-
> slave Terracotta server and two Terrastore instances, one of each on
> two physical servers with 32GB of memory (considerably more than
> necessary for my application).
That's plenty of memory ... we have more modest machines :)
Our masters run with 6 gigabytes, while servers run with 3 gigabytes.
With such a setup, we handle several millions (compressed) documents
and a database ranging from 10 to 20 gigabytes.
> How much pain am I in store for if I deploy v0.81 to production now?
> Will 0.82 incur some substantial changes?
Everything I said is referred to (unfortunately still not officially
released) 0.8.2 :)
It contains lots of fixes and anhancements, so I strongly suggest you
to use 0.8.2.
> In theory, even if I need
> to reinstall Terracotta and Terrastore, I should be able to restore my
> data from my bucket().backup() files. Right?
The backup format for 0.8.1 is different from the 0.8.2 format, so you
should write a backup tool by yourself (which should be fairly
straightforward by the way) ... sorry for that.
> I was a tiny bit
> alarmed that the files were not explicitly human-readable JSON. They
> are approximately human-readable, but I believe they have been very
> slightly serialized; no?
Somewhat ... also, 0.8.2 backup files are compressed too.
> To put a fine point on it: if I decide that for whatever reason a
> pre-1.0 Terrastore is unworkable for me, I will need to write my own
> export functionality to get the data out in a plain JSON form. Or am
> I missing something? (Note: I don't plan to give up on Terrastore
> very easily! I really like it. I'd have to be in a desperate
> situation where I've lost data and can't blame myself.)
Data integrity is paramount, so I'll not blame you for such a
defensive practice ;)
> Finally, I again solicit anyone who has done a production Terracotta/
> Terrastore environment to share any best practices or lessons
> learned. I hope to do the same once I'm into the thick of it myself.
I'm soliciting too ... it would be great to know of other production
or almost-production experiences :)
Also, should anyone be interested to know more about our production
deployment ... just get back with questions ;)
Cheers!
Sergio B.
--
Sergio Bossa
http://www.linkedin.com/in/sergiob
> Good idea. I think that the load is going to be much lower than the
> server's capacity such that I will likely avoid the memory contention
> you've described for a long time. But it's good to hear you have some
> ideas for this that I can put to use if/when that becomes an issue.
Great.
> Each .jdb file (aside from the most recent) is pegged at 9,766 KB for
> what appears to be eternity. As I mentioned earlier, 00000000.jdb has
> create/modified dates reaching back to the very start of my project.
Regarding *old* jdb files still sitting there, that's not a problem:
there are a few information which are currently never deleted, such as
client lock tables, tombstones for deleted buckets ... log files are
only cleaned/deleted if their utilization percentage falls under a
given limit, so it may be that your old jdb files still reference
active data.
> Since then, dozens of .jdb files have been created, but none has been
> deleted, even when I delete buckets. During development, I have
> deleted all buckets and created new buckets dozens of times.
That could be a problem instead: deleting buckets should clear up lots of space.
> When you say the "does the db size decrease," it occurs to me that you
> may be referring to something other than the .jdb files. Is there
> another file or set of files I should be looking at?
I'm referring to the tc-data/server-data/objectdb directory.
If the objectdb directory size keeps increasing even after buckets
removal, there may be some (unknown) bug in Terrastore 0.8.1, or some
configuration problems with BerkleyDB JE cleaner and checkpointer (as
BDBJE is used by Terracotta master).
So I'd suggest you to update to the latest Terrastore and see if the
problem is still there: in such a case, we'll go ahead discussing
BDBJE configuration.
> This gives me a great deal of confidence that I'm significantly over-
> provisioned on memory. Memory is so astonishingly cheap right now
> that I see no reason to not buy gobs of it. But if you're doing that
> kind of load with 3 GB allocated to the servers, I am not concerned
> about my application--at least from a memory standpoint.
Cool :)
> I just saw the 0.8.2 announcement! Congratulations. I'll be
> switching over soon.
Thanks! Let us know how it goes.
> Do you suspect that the backup format will become stable over time,
> maybe at 1.0?
I'd really like to speed up the backup import/export process, but I'll
try to keep the current format unchanged.
> I've successfully upgraded to Terrastore 0.8.2. Upgrading was a
> cinch.
Cool :)
> I restored some buckets and the objectdb file was 4.4MB.
> I then dropped all buckets and confirmed that the buckets list was
> returning an empty set.
> Oddly enough, I observed that having dropped the buckets, the objectdb
> had grown to 4.7MB.
> I shut down and restarted Terracotta and Terrastore, the objectdb grew
> to 6.8MB just by restarting.
> The bucket list still shows nothing.
> After restoring the same data again, now the objectdb files now total
> 10MB.
> The size never seems to decrease.
There's an easy explanation for your test results.
Terrastore master database is based on BDBJE, which is a
log-structured database where deleted entries get actually deleted
asynchronously by two dedicated threads: the cleaner and the
checkpointer; for performance reason, as a good-enough heuristic,
they're currently configured in Terrastore to run after 20 MBs of
written data for the former, and 50 MBs for the latter (so that the
checkpointer doesn't run too often and also compacts data from more
cleaning rounds); in your case, you wrote too few data and so they
didn't run nor clean anything.
They're configured in the terracotta-config xml file under the
following properties:
l2.berkeleydb.je.cleaner.bytesInterval
l2.berkeleydb.je.checkpointer.bytesInterval
For more information about the cleaner and checkpointer threads, and
some more properties you may want to tweak, you can start with the
following, and then get back here with questions ;)
http://download.oracle.com/docs/cd/E17277_02/html/GettingStartedGuide/backgroundthreads.html
sorry for the late response.
You are correct, the Terrastore server startup sequence actually
causes some (~1000) synchronous disk writes by the master side:
technically speaking, it creates 1024 clustered locks needed to manage
internal concurrency levels, and it does so synchronously to address a
Terracotta bug.
Apart from the heavy (but limited in time) disk usage, the only issue
is that those locks are (currently) never removed from disk, even if
the owner server dies, but that shouldn't cause any particular grow of
jdb files, unless you have a very high server restart rate; and btw,
this will be fixed in future versions.
Hope that answers your concerns :)
Cheers,
Sergio B.
--
Sergio Bossa
http://www.linkedin.com/in/sergiob
Excellent. Just knowing that these things are known and that fixes
are in the queue helps. Thanks again!