Questions about production deployments

102 views
Skip to first unread message

bhauer

unread,
Sep 1, 2011, 10:50:05 AM9/1/11
to terrastore-discussions
I had wanted to post this as a continuation of my previous thread, but
apparently Google Groups doesn't allow you to reply to old threads?
How bizarre.

Hi Sergio:

It has been a while; I am happy to report that I am approaching a
launch-ready version of my site based on Terrastore. The past few
months have been primarily focused on elements not immediately related
to Terrastore. But as I reach a point where a production deployment
is near--perhaps two months away or thereabouts--I return to these
same thoughts.

I am prepared to take a gamble with running Terrastore in a production
environment. I feel I have taken adequate precaution by implementing
connectivity failure detection, a caching layer, an automated backup
agent, and some alerting mechanisms (I won't be able to keep a
watchful eye on the system at all hours of the day). However, I still
feel a little at odds with Terracotta, and not so much Terrastore
specifically. Although I've sifted through the Terracotta
documentation, it remains quite opaque to me--the documentation does
not appear to directly address my questions.

Most notably, I am very nervous about the apparent uncontrolled growth
of the objectdb. Dropping and recreating Terrastore buckets routinely
during development has lead to a great deal of presumably
unnecessary .jdb files in the tc-data/objectdb directory with date
stamps reaching all the way back to the start of my project. I don't
know how to force these to be garbage collected, if only for the peace
of mind that they will not continue to grow until they consume the
entire storage system. :) I also don't know if their persistence will
cause my Terracotta servers to eventually suffer performance
degradation, which would be particularly frustrating if it were caused
in part by data that my application had "deleted."

I also have not yet found the time to exercise the Terracotta
clustering to observe its real-world behavior when the master goes
offline, the slave takes over, and then the master is restored. By
contrast, in development, I have routinely started and stopped
multiple Terrastore servers so I know that those join and exit their
cluster fairly smoothly.

I expect my site to see very little traffic but nevertheless I want it
to be as solid as possible within reason. I plan to deploy a master-
slave Terracotta server and two Terrastore instances, one of each on
two physical servers with 32GB of memory (considerably more than
necessary for my application).

How much pain am I in store for if I deploy v0.81 to production now?
Will 0.82 incur some substantial changes? In theory, even if I need
to reinstall Terracotta and Terrastore, I should be able to restore my
data from my bucket().backup() files. Right? I was a tiny bit
alarmed that the files were not explicitly human-readable JSON. They
are approximately human-readable, but I believe they have been very
slightly serialized; no?

To put a fine point on it: if I decide that for whatever reason a
pre-1.0 Terrastore is unworkable for me, I will need to write my own
export functionality to get the data out in a plain JSON form. Or am
I missing something? (Note: I don't plan to give up on Terrastore
very easily! I really like it. I'd have to be in a desperate
situation where I've lost data and can't blame myself.)

Finally, I again solicit anyone who has done a production Terracotta/
Terrastore environment to share any best practices or lessons
learned. I hope to do the same once I'm into the thick of it myself.

Sergio Bossa

unread,
Sep 2, 2011, 5:00:45 AM9/2/11
to terrastore-...@googlegroups.com
Hi bhauer,

thanks for sharing your experience, I'm glad you're liking Terrastore
and going into production with it :)
We're also using Terrastore in production here, so I'll try to share
some of my knowledge below.

> I am prepared to take a gamble with running Terrastore in a production
> environment.  I feel I have taken adequate precaution by implementing
> connectivity failure detection, a caching layer, an automated backup
> agent, and some alerting mechanisms (I won't be able to keep a
> watchful eye on the system at all hours of the day).

That sounds good and necessary.
I've found performance monitoring very important too: that's because
Terrastore is memory-based, and performance of memory-based systems
tends to degrade under one of the following circumstances:
1) Large memory blobs are faulted in, either from disk or network.
2) Memory saturates and full garbage collection kicks in.
So first, always monitor your memory and garbage collector logs.
Also, we set up several metrics to track execution times of most
expensive Terrastore operations (i.e., range queries and map reduce
processing), so to be alerted when performance degrades: we're using
Nimrod for that (see https://github.com/sbtourist/nimrod).

Talking about memory expensive operations, the most critical ones
surely are those which deal with a huge number of keys: i.e., range
queries over large buckets.
That's because the master needs to send the whole key set to the
server, which in turn needs to materialize it in its heap: so, large
buckets are costly, and if you have them, and have range queries over
them, you're strongly suggested to do some monitoring.

> Most notably, I am very nervous about the apparent uncontrolled growth
> of the objectdb.  Dropping and recreating Terrastore buckets routinely
> during development has lead to a great deal of presumably
> unnecessary .jdb files in the tc-data/objectdb directory with date
> stamps reaching all the way back to the start of my project.

Apart from the number of jdb files, doesn't the db size decrease when
you delete buckets and documents?

> I also have not yet found the time to exercise the Terracotta
> clustering to observe its real-world behavior when the master goes
> offline, the slave takes over, and then the master is restored.  By
> contrast, in development, I have routinely started and stopped
> multiple Terrastore servers so I know that those join and exit their
> cluster fairly smoothly.

The Terracotta master failover works pretty well ... unless you have a
huge db :)
The main problem with huge databases is active -> passive
synchronization when the passive gets back online after a failure:
we've found that if the database gets over 10 gigabytes,
synchronization tends to require lots of memory and lasts several
hours; there are a few tricks to tune the synchronization process, but
they're fairly specific to the production setup, so get back later
with more details in case you need more info.
Other than that, don't forget the golden rule: if you have
active/passive masters, and for some reason all get offline, always
start the *latest* active first (which may obviously be different from
the one you started), and then all passives.

> I expect my site to see very little traffic but nevertheless I want it
> to be as solid as possible within reason.  I plan to deploy a master-
> slave Terracotta server and two Terrastore instances, one of each on
> two physical servers with 32GB of memory (considerably more than
> necessary for my application).

That's plenty of memory ... we have more modest machines :)
Our masters run with 6 gigabytes, while servers run with 3 gigabytes.
With such a setup, we handle several millions (compressed) documents
and a database ranging from 10 to 20 gigabytes.

> How much pain am I in store for if I deploy v0.81 to production now?
> Will 0.82 incur some substantial changes?

Everything I said is referred to (unfortunately still not officially
released) 0.8.2 :)
It contains lots of fixes and anhancements, so I strongly suggest you
to use 0.8.2.

> In theory, even if I need
> to reinstall Terracotta and Terrastore, I should be able to restore my
> data from my bucket().backup() files.  Right?

The backup format for 0.8.1 is different from the 0.8.2 format, so you
should write a backup tool by yourself (which should be fairly
straightforward by the way) ... sorry for that.

> I was a tiny bit
> alarmed that the files were not explicitly human-readable JSON.  They
> are approximately human-readable, but I believe they have been very
> slightly serialized; no?

Somewhat ... also, 0.8.2 backup files are compressed too.

> To put a fine point on it: if I decide that for whatever reason a
> pre-1.0 Terrastore is unworkable for me, I will need to write my own
> export functionality to get the data out in a plain JSON form.  Or am
> I missing something?  (Note: I don't plan to give up on Terrastore
> very easily!  I really like it.  I'd have to be in a desperate
> situation where I've lost data and can't blame myself.)

Data integrity is paramount, so I'll not blame you for such a
defensive practice ;)

> Finally, I again solicit anyone who has done a production Terracotta/
> Terrastore environment to share any best practices or lessons
> learned.  I hope to do the same once I'm into the thick of it myself.

I'm soliciting too ... it would be great to know of other production
or almost-production experiences :)

Also, should anyone be interested to know more about our production
deployment ... just get back with questions ;)

Cheers!

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

bhauer

unread,
Sep 18, 2011, 10:52:47 AM9/18/11
to terrastore-discussions
Hi Sergio,

Thanks for the detailed reply. Here are some additional thoughts
based on your feedback.

On Sep 2, 2:00 am, Sergio Bossa <sergio.bo...@gmail.com> wrote:
> That sounds good and necessary.
> I've found performance monitoring very important too: that's because
> Terrastore is memory-based, and performance of memory-based systems
> tends to degrade under one of the following circumstances:
> 1) Large memory blobs are faulted in, either from disk or network.
> 2) Memory saturates and full garbage collection kicks in.
> So first, always monitor your memory and garbage collector logs.
> Also, we set up several metrics to track execution times of most
> expensive Terrastore operations (i.e., range queries and map reduce
> processing), so to be alerted when performance degrades: we're using
> Nimrod for that (seehttps://github.com/sbtourist/nimrod).

Good idea. I think that the load is going to be much lower than the
server's capacity such that I will likely avoid the memory contention
you've described for a long time. But it's good to hear you have some
ideas for this that I can put to use if/when that becomes an issue.

> Apart from the number of jdb files, doesn't the db size decrease when
> you delete buckets and documents?

Each .jdb file (aside from the most recent) is pegged at 9,766 KB for
what appears to be eternity. As I mentioned earlier, 00000000.jdb has
create/modified dates reaching back to the very start of my project.
Since then, dozens of .jdb files have been created, but none has been
deleted, even when I delete buckets. During development, I have
deleted all buckets and created new buckets dozens of times.

When you say the "does the db size decrease," it occurs to me that you
may be referring to something other than the .jdb files. Is there
another file or set of files I should be looking at?

Am I missing something that's very obvious to someone familiar with
Terracotta? I feel like there has to be some way to tell Terracotta,
"Hey, you can trim out the unreferenced objects in those old .jdb
files."

> > I also have not yet found the time to exercise the Terracotta
> > clustering to observe its real-world behavior when the master goes
> > offline, the slave takes over, and then the master is restored.  By
> > contrast, in development, I have routinely started and stopped
> > multiple Terrastore servers so I know that those join and exit their
> > cluster fairly smoothly.
>
> The Terracotta master failover works pretty well ... unless you have a
> huge db :)

That shouldn't be a problem for me. At least for now. :)

> That's plenty of memory ... we have more modest machines :)
> Our masters run with 6 gigabytes, while servers run with 3 gigabytes.
> With such a setup, we handle several millions (compressed) documents
> and a database ranging from 10 to 20 gigabytes.

This gives me a great deal of confidence that I'm significantly over-
provisioned on memory. Memory is so astonishingly cheap right now
that I see no reason to not buy gobs of it. But if you're doing that
kind of load with 3 GB allocated to the servers, I am not concerned
about my application--at least from a memory standpoint.

That is, for the time being and for my application, my goal is to do
whatever I can to maximize reliability/durability. Performance seems
to be no problem for now.

> Everything I said is referred to (unfortunately still not officially
> released) 0.8.2 :)
> It contains lots of fixes and anhancements, so I strongly suggest you
> to use 0.8.2.

I just saw the 0.8.2 announcement! Congratulations. I'll be
switching over soon.

> The backup format for 0.8.1 is different from the 0.8.2 format, so you
> should write a backup tool by yourself (which should be fairly
> straightforward by the way) ... sorry for that.

No worries!

Do you suspect that the backup format will become stable over time,
maybe at 1.0?

Sergio Bossa

unread,
Sep 20, 2011, 3:52:39 AM9/20/11
to terrastore-...@googlegroups.com
On Sun, Sep 18, 2011 at 4:52 PM, bhauer <bhs...@tsotech.com> wrote:

> Good idea.  I think that the load is going to be much lower than the
> server's capacity such that I will likely avoid the memory contention
> you've described for a long time.  But it's good to hear you have some
> ideas for this that I can put to use if/when that becomes an issue.

Great.

> Each .jdb file (aside from the most recent) is pegged at 9,766 KB for
> what appears to be eternity.  As I mentioned earlier, 00000000.jdb has
> create/modified dates reaching back to the very start of my project.

Regarding *old* jdb files still sitting there, that's not a problem:
there are a few information which are currently never deleted, such as
client lock tables, tombstones for deleted buckets ... log files are
only cleaned/deleted if their utilization percentage falls under a
given limit, so it may be that your old jdb files still reference
active data.

> Since then, dozens of .jdb files have been created, but none has been
> deleted, even when I delete buckets.  During development, I have
> deleted all buckets and created new buckets dozens of times.

That could be a problem instead: deleting buckets should clear up lots of space.

> When you say the "does the db size decrease," it occurs to me that you
> may be referring to something other than the .jdb files.  Is there
> another file or set of files I should be looking at?

I'm referring to the tc-data/server-data/objectdb directory.
If the objectdb directory size keeps increasing even after buckets
removal, there may be some (unknown) bug in Terrastore 0.8.1, or some
configuration problems with BerkleyDB JE cleaner and checkpointer (as
BDBJE is used by Terracotta master).
So I'd suggest you to update to the latest Terrastore and see if the
problem is still there: in such a case, we'll go ahead discussing
BDBJE configuration.

> This gives me a great deal of confidence that I'm significantly over-
> provisioned on memory.  Memory is so astonishingly cheap right now
> that I see no reason to not buy gobs of it.  But if you're doing that
> kind of load with 3 GB allocated to the servers, I am not concerned
> about my application--at least from a memory standpoint.

Cool :)

> I just saw the 0.8.2 announcement!  Congratulations.  I'll be
> switching over soon.

Thanks! Let us know how it goes.

> Do you suspect that the backup format will become stable over time,
> maybe at 1.0?

I'd really like to speed up the backup import/export process, but I'll
try to keep the current format unchanged.

bhauer

unread,
Sep 20, 2011, 9:50:09 PM9/20/11
to terrastore-discussions
Hi again Sergio!

I've successfully upgraded to Terrastore 0.8.2. Upgrading was a
cinch. After your previous replies, I implemented a custom backup and
restore application that writes each bucket to Kryo-serialized maps
stored to a gzip stream.

So I dumped each bucket to said file format, removed the contents of
the Terracotta server's tc-data directory, overwrote the libs of both
Terracotta and Terrastore, and picked up any configuration file
changes. Fired things back up and restored the buckets.

This is development data, so I wasn't actually worried about losing it
if it didn't go smoothly. But it did go smoothly, so I was happy.

On Sep 20, 12:52 am, Sergio Bossa <sergio.bo...@gmail.com> wrote:
> I'm referring to the tc-data/server-data/objectdb directory.
> If the objectdb directory size keeps increasing even after buckets
> removal, there may be some (unknown) bug in Terrastore 0.8.1, or some
> configuration problems with BerkleyDB JE cleaner and checkpointer (as
> BDBJE is used by Terracotta master).
> So I'd suggest you to update to the latest Terrastore and see if the
> problem is still there: in such a case, we'll go ahead discussing
> BDBJE configuration.

I just confirmed the same behavior with 0.8.2. As a test, from a
fresh Terracotta tc-data directory:

I restored some buckets and the objectdb file was 4.4MB.
I then dropped all buckets and confirmed that the buckets list was
returning an empty set.
Oddly enough, I observed that having dropped the buckets, the objectdb
had grown to 4.7MB.
I shut down and restarted Terracotta and Terrastore, the objectdb grew
to 6.8MB just by restarting.
The bucket list still shows nothing.
After restoring the same data again, now the objectdb files now total
10MB.
The size never seems to decrease.

Have you been able to conduct a similar test?

> I'd really like to speed up the backup import/export process, but I'll
> try to keep the current format unchanged.

At least from my perspective, I'd say don't sweat it. :)

Sergio Bossa

unread,
Sep 23, 2011, 3:13:41 PM9/23/11
to terrastore-...@googlegroups.com
On Wed, Sep 21, 2011 at 3:50 AM, bhauer <bhs...@tsotech.com> wrote:

> I've successfully upgraded to Terrastore 0.8.2.  Upgrading was a
> cinch.

Cool :)

> I restored some buckets and the objectdb file was 4.4MB.
> I then dropped all buckets and confirmed that the buckets list was
> returning an empty set.
> Oddly enough, I observed that having dropped the buckets, the objectdb
> had grown to 4.7MB.
> I shut down and restarted Terracotta and Terrastore, the objectdb grew
> to 6.8MB just by restarting.
> The bucket list still shows nothing.
> After restoring the same data again, now the objectdb files now total
> 10MB.
> The size never seems to decrease.

There's an easy explanation for your test results.
Terrastore master database is based on BDBJE, which is a
log-structured database where deleted entries get actually deleted
asynchronously by two dedicated threads: the cleaner and the
checkpointer; for performance reason, as a good-enough heuristic,
they're currently configured in Terrastore to run after 20 MBs of
written data for the former, and 50 MBs for the latter (so that the
checkpointer doesn't run too often and also compacts data from more
cleaning rounds); in your case, you wrote too few data and so they
didn't run nor clean anything.
They're configured in the terracotta-config xml file under the
following properties:
l2.berkeleydb.je.cleaner.bytesInterval
l2.berkeleydb.je.checkpointer.bytesInterval
For more information about the cleaner and checkpointer threads, and
some more properties you may want to tweak, you can start with the
following, and then get back here with questions ;)
http://download.oracle.com/docs/cd/E17277_02/html/GettingStartedGuide/backgroundthreads.html

bhauer

unread,
Nov 15, 2011, 10:36:58 PM11/15/11
to terrastore-discussions
Hi again Sergio,

Although I'm still feeling uncertain about what I observe as seemingly
unchecked growth to the .jdb files, an interest of mine at present is
the Terrastore start-up sequence. I was just now observing the
sequence using Process Monitor so that I could attempt to understand
why it seems to hit the hard drives so hard during start-up.

During the start-up of a Terrastore server, the Terracotta java
process enters a phase where it makes hundreds/thousands of small
writes to the .jdb files in rapid succession. By small I mean 720
bytes to 4096 bytes. It seems each write is fully committed and
flushed to the file system, causing a great deal of disk activity (as
opposed to allowing the writes to be buffered). During the start-up
cycle of one Terrastore instance, the Terracotta server topped off an
existing .jdb file to its capacity and created the next .jdb file,
writing 556 KB to that file in these tiny fully-committed/flushed
chunks.

In other words, let's say the start-up adds approximately 2 MB to
the .jdb files. Putting aside my confusion about why that happens
generally, I am presently curious if there is a reason it happens as
hundreds/thousands of tiny writes over the course of ~15 seconds (on
my admittedly meager workstation) when 2 MB would otherwise be written
by the file system in a few milliseconds.

It's probably by design, but I am nevertheless curious if you could
confirm.

If we were speaking of a conventional database, I'd associate this
sort of thing with running many INSERTs as separate transactions
rather than using a batch. Is there even such an analogue here?

This is mostly curiosity, but there's a tiny bit of paranoia about the
wear on my workstation's disks. :)

Sergio Bossa

unread,
Nov 17, 2011, 8:37:16 AM11/17/11
to terrastore-...@googlegroups.com
Hi bhauer,

sorry for the late response.

You are correct, the Terrastore server startup sequence actually
causes some (~1000) synchronous disk writes by the master side:
technically speaking, it creates 1024 clustered locks needed to manage
internal concurrency levels, and it does so synchronously to address a
Terracotta bug.
Apart from the heavy (but limited in time) disk usage, the only issue
is that those locks are (currently) never removed from disk, even if
the owner server dies, but that shouldn't cause any particular grow of
jdb files, unless you have a very high server restart rate; and btw,
this will be fixed in future versions.

Hope that answers your concerns :)
Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

bhauer

unread,
Nov 18, 2011, 10:08:39 PM11/18/11
to terrastore-discussions
Hi Sergio,

Excellent. Just knowing that these things are known and that fixes
are in the queue helps. Thanks again!

Reply all
Reply to author
Forward
0 new messages