Mrs version 0.9 released

Andrew McNabb

unread,

Nov 14, 2012, 3:32:22 PM11/14/12

to mrs-ma...@googlegroups.com

I'm pleased to announce the release of Mrs version 0.9, which represents
a new step forward in stability and performance. Prominent changes in
this release include:

* Relicensing under the Apache Software License instead of the GPL.

* Support of data larger than available RAM, configurable with a new
--mrs-max-sort-size option.

* Extended documentation including a new User Guide.

* Support for specifying custom serializers in addition to pickle (the default
serializer).

* A new data-intensive example, Walk Analyzer, contributed by Matt Gardner.

* A maximum number of failures per task, configurable with the new
--mrs-max-failures option.

* A variety of bug fixes.

Thanks to everyone who contributed to this release.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

Matthew Gardner

unread,

Nov 14, 2012, 3:36:06 PM11/14/12

to mrs-ma...@googlegroups.com

On Wed, Nov 14, 2012 at 3:32 PM, Andrew McNabb <amc...@mcnabbs.org> wrote:

* A maximum number of failures per task, configurable with the new
--mrs-max-failures option.

What does that do, exactly (in particular, in relation to previous behavior)? I didn't see that when I read the documentation earlier today. If a task fails that many times, the job fails? Is that right? And the previous max-failures was 1?

Andrew McNabb

unread,

Nov 14, 2012, 3:57:22 PM11/14/12

to mrs-ma...@googlegroups.com

On Wed, Nov 14, 2012 at 03:36:06PM -0500, Matthew Gardner wrote:
>
> What does that do, exactly (in particular, in relation to previous
> behavior)? I didn't see that when I read the documentation earlier today.
> If a task fails that many times, the job fails? Is that right? And the
> previous max-failures was 1?

That's a good question. Previously it was ∞, meaning that it was the
eternal optimist. Thanks for pointing out that it's not in the
documentation--it needs to be added.

Matthew Gardner

unread,

Nov 14, 2012, 4:00:39 PM11/14/12

to mrs-ma...@googlegroups.com

On Wed, Nov 14, 2012 at 3:57 PM, Andrew McNabb <amc...@mcnabbs.org> wrote:

On Wed, Nov 14, 2012 at 03:36:06PM -0500, Matthew Gardner wrote:
>
> What does that do, exactly (in particular, in relation to previous
> behavior)? I didn't see that when I read the documentation earlier today.
> If a task fails that many times, the job fails? Is that right? And the
> previous max-failures was 1?

That's a good question. Previously it was ∞, meaning that it was the
eternal optimist. Thanks for pointing out that it's not in the
documentation--it needs to be added.

It seems also that with that implemented, it should be pretty easy to also have this switch: instead of the job failing, that task is ignored and the job moves on without it. This would be nice, for example, if you have lots and lots of data, some of which might be messy, and you just want to get the best estimate of things out of it that you can, and you're somewhat tolerant of shards getting skipped. I know I've brought this up before; it just seems really easy to put it in, now that I know about this behavior.

Andrew McNabb

unread,

Nov 14, 2012, 4:05:30 PM11/14/12

to mrs-ma...@googlegroups.com

On Wed, Nov 14, 2012 at 04:00:39PM -0500, Matthew Gardner wrote:
>
> It seems also that with that implemented, it should be pretty easy to also
> have this switch: instead of the job failing, that task is ignored and the
> job moves on without it. This would be nice, for example, if you have lots
> and lots of data, some of which might be messy, and you just want to get
> the best estimate of things out of it that you can, and you're somewhat
> tolerant of shards getting skipped. I know I've brought this up before; it
> just seems really easy to put it in, now that I know about this behavior.

Yes, this behavior would be quite easy to implement, particularly now
that the max failures option is in.

Reply all

Reply to author

Forward