> Now, this is just a guess, but I'm pretty sure swamp doesn't worry too
> much about fault tolerance at this point. I mean no offense by my
> presumption, but for starters there is no persistence of the token
> returned by a script submission. If the server were to crash, the
> data would be lost to the client since there is no longer a reference
> to it via the token (but the data would actually be living in the
That's absolutely correct. In a previous version, all job state
(submission, progress, etc) was tracked in a SQLite database, but the
concurrency issues of that implementation were pretty horrifying for
performance. That said, there's definitely room to store at least the
high-level job state for fault tolerance issues. Putting only the
high-level stuff in the db should be fine for performance. You could
do periodic checkpoints of the finer-grained job state, as well.
> publish space.) Please correct me if I'm wrong on that last point. I
> think an evaluation of swamp's fault tolerance would be a fantastic
> exercise.
Sure, feel free. Fault tolerance hasn't been a huge priority, because
it falls in the category of something desired only when the larger
goal (getting results from (remote) data faster) is established and
accomplished.
> There's a catch, however. It's likely that I won't be able to achieve
> the 15-20 pages required for the writeup based on my evaluation alone,
> but who knows. That being said, I'd like to possibly add fault
> tolerance to swamp via custom code. I could always develop this code
> independent of swamp proper, but if it turns out to add value to the
> project I'm hoping that I could contribute it back to you.
Sure, that sounds good. FYI, I'm working on a major change to the
underlying execution engine, so that part is a little unstable right
now, but the front-end api seems workable. I'll merge it as soon as
it's workable.
> I'd like your opinion on a few things.
> 1) Do you think this fault tolerance evaluation is a worthwhile
> endeavor?
Of swamp's fault tolerance in particular? Probably. I'd be
interested in what you find. At this point, it hasn't been a big deal
to just clean everything up and restart the jobs, but I could see how
people could find it a hassle.
> 2) Do you see this as adding value to swamp if some code were
> contributed?
Yes, sure. Fault tolerance would be a great thing in SWAMP. And the
prettier and better documented the code is, the less chance I'll have
to accidentally break it while implementing feature X, Y, or Z. (That
goes for my own code, as well.)
Take care,
-Daniel