On Mon, 2008-10-13 at 10:39 -0700, Parand Darugar wrote:
> Thinking through the need for persistence -
>
> Having a log based system would be useful for things beyond dealing
> with the power going out - eg. you could do auditing with logs - but
> putting that aside and thinking only of failure scenarios:
>
> I'm wondering what is the real-world probability of power going out -
> ie. failures that are truly sudden and can't be detected / dealt with
> - vs. failures that can be detected.
Ever heard of Murphy's Law?
> Point being, if the case where something goes wrong but the process
> still has some control is the more common case, perhaps building a
> system to dump state before crashing would provide most of the benefit
> of persistence without the performance penalty.
This may be useful for a controlled shutdown and restart without losing
jobs. But there are hardly any real-world failure conditions that can be
safely trapped. What happens during a hardware failure is pretty much
undefined, at least on the standard hardware that we're all using.
> This dump-state / read-state capability would probably the same thing
> that'd be used for gracefully shutting down and restarting a
> beanstalkd.
>
> I haven't looked at the innards of beanstalkd and have no real-world
> experience with it so I have no idea what the common failure scenarios
> are, but I thought I'd throw it out there to see what people think.
Well, I'd prefer to see the development resources focussed on full blown
persistence, as by the roadmap. Real persistence will also enable
dump/restore but more importantly will it mitigate the reliability
issue.
Alternatively (or additionally) I could imagine some sort of tandem or
"cluster"-mode where all jobs are sent to at least two beanstalkd
instances that live on separate servers. Only one instance (the
"master") would actually be processing jobs, the others would
serve as hot-standby and pick up in case of failure.
The ultimate goal should be to make the queue safe for true "fire and
forget" operation from the job-producer's point of view. Once a job has
entered the queue, it is the queue's responsiblity to keep it safe until
it is consumed - even in the face of hardware failure.
regards, jj
For those interested in taking a look I have created a fork on github
with my changes.
http://github.com/gbarr/beanstalkd/commits/binlog
It still requires more work, but it does work. However I have not done
any performance testing.
Graham.
At first glance I like this a lot. It's almost exactly what I was
thinking of doing, plus the brilliant idea of using ref counts to
expire old files.
The copyright notice in binlog.c lists Keith Rarick and Philotic as
copyright holders, but you wrote the code in there -- it should list
only Graham Barr! And you probably want to change the year to 2008.
I'd love to get this in to 1.2. I'll read it more carefully tomorrow,
do some testing, and possibly push some work onto the binlog branch,
right after I get 1.1 out the door.
kr
Good. I think there is a lot we can build on top of this.
> The copyright notice in binlog.c lists Keith Rarick and Philotic as
> copyright holders, but you wrote the code in there -- it should list
> only Graham Barr! And you probably want to change the year to 2008.
OK, I changed that.
> I'd love to get this in to 1.2. I'll read it more carefully tomorrow,
> do some testing, and possibly push some work onto the binlog branch,
> right after I get 1.1 out the door.
It has been a while since I have written any C code, so be gentle :-)
I tried to follow your indentation style, but maybe got it wrong at
times.
Do you use gnuindent ? If so you might want to think about adding
a .indent.pro
to the repository for the style you use.
Graham.
I don't use it, but I'm happy to add a .indent.pro file.
kr
On second thought, indent seems unable to put code on the same line as
the if statement, such as
if (!j) return NULL;
and that style occurs throughout the code. This seems to make indent
pretty much unusable.
Instead, I'll just document the conventions that exist and leave it up
to humans to do the right thing. :) I'm not inclined to be super-picky
about this stuff. It's easy enough to clean up later if readability is
an issue.
kr
It looks mostly okay. There are a couple of bugs to fix and a few
things I'd like to change before merging this into master.
> I tried to follow your indentation style, but maybe got it wrong at
> times.
It follows the existing style pretty well, except for a few minor
things. The only one worth mentioning here is that I'd like to keep
lines under 80 colums wide.
kr
I just merged the binlog branch into master and pushed that out. It is
ready for developers and adventurous users to test. Please try it out
and give us feedback, but don't trust your production data to it yet.
As for a timeframe, I can only say it'll be out when it's ready.
> Also, is the log format externally digestible? For example, how hard
> would it be to write scripts to analyze type and source of messages?
The format is very simple, but subject to change until the next
release. We'll have to decide on a policy about binlog format
compatibility between releases. I would prefer to declare that the
format can change incompatibly at any time, so don't rely on it to
store your jobs across beanstalkd upgrades -- it's mostly for crash
recovery. That said, I doubt it'll change very often, and beanstalkd
will tell you if the format has changed and refuse to run (since
there's a format version number in the binlog). That means you can
safely use a binlog across any version of beanstalkd as long as it
lets you.
You can certainly write external tools to analyze the binlog -- the
format is just a version number followed by a sequence of records. But
it'll likely never be documented extensively (unlike the network
protocol), so authors of such tools will just have to look at the code
in binlog.c that reads the file (it's at the top of binlog_replay()
and not that complicated).
> Btw, much thanks for working on persistence, it's going to make
> beanstalkd even more applicable.
Huge thanks to Graham Barr for kick-starting this feature with an
essentially complete implementation. All that's left is a little
polish and a lot of testing.
kr
Hmm, there's no API per se. The code to read the binlog format is
really about ten lines. But maybe documentation would be a good idea.
It won't take long and it'll be nice to have for historical binlog
files after the format has changed.
So, let's say you should look in doc/binlog.txt to find the current
format in each release. And look in doc/historical-binlog.txt for old
formats. I'll make sure to update those files if necessary before each
release.
kr
The value 10mb was an arbitrary choice. The original intent was actually
to add a command line option to set the size and just have 10mb as the
default.
Graham.
Having smaller files saves a little disk space on average, since they
can be deleted with finer granularity. But it also ought to make
things a little slower on average, since there are more disk seeks
when creating a new file and during recovery.
I predict these differences will be small for most people, unless you
are pushing the limits of your hardware.
kr