Counters?

Matthew Gardner

unread,

Oct 31, 2012, 11:02:55 AM10/31/12

to mrs-ma...@googlegroups.com

I'm in the process of updating walk_analyzer.py to do everything I want it to, and I'm wondering if mrs has an equivalent to hadoop's counters. Is there something I can call to keep track of how many times some particular thing happens in each of the slaves?

Andrew McNabb

unread,

Oct 31, 2012, 12:10:11 PM10/31/12

to mrs-ma...@googlegroups.com

I've usually used grep through the logs, which is honestly just as
efficient as counters. But feel free to open a feature request on the
issue tracker if you think this is a significant feature. Like I said,
I haven't personally missed it, but if others find it useful, I can't
think of any particular reason not to have it.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

Matthew Gardner

unread,

Oct 31, 2012, 12:18:31 PM10/31/12

to mrs-ma...@googlegroups.com

On Wed, Oct 31, 2012 at 12:10 PM, Andrew McNabb <amc...@mcnabbs.org> wrote:

I've usually used grep through the logs, which is honestly just as
efficient as counters. But feel free to open a feature request on the
issue tracker if you think this is a significant feature. Like I said,
I haven't personally missed it, but if others find it useful, I can't
think of any particular reason not to have it.

In the hadoop code I've been running, the counters are frequently in the hundreds of millions, sometimes in the billions. Would you suggest logging output that large and using grep to process it? Is there some bottleneck in mrs (or map reduce generally) that makes the two equally efficient? And even if they are, having the counters as objects in the code lets you monitor them as the job is going, so you can (for instance) get an early warning if something is going poorly, so you don't have to wait for a long job to finish to know you need to fix something.

Andrew McNabb

unread,

Oct 31, 2012, 12:22:34 PM10/31/12

to mrs-ma...@googlegroups.com

On Wed, Oct 31, 2012 at 12:18:31PM -0400, Matthew Gardner wrote:
>
> In the hadoop code I've been running, the counters are frequently in the
> hundreds of millions, sometimes in the billions. Would you suggest logging
> output that large and using grep to process it? Is there some bottleneck
> in mrs (or map reduce generally) that makes the two equally efficient? And
> even if they are, having the counters as objects in the code lets you
> monitor them as the job is going, so you can (for instance) get an early
> warning if something is going poorly, so you don't have to wait for a long
> job to finish to know you need to fix something.

I was thinking something that made sense to me at the time I wrote the
email, but which makes less sense to me now that I'm thinking about it
in more detail. In short, there are occasional circumstances where grep
is adequate, but you're correct that this isn't always the case.

Matthew Gardner

unread,

Oct 31, 2012, 12:50:32 PM10/31/12

to mrs-ma...@googlegroups.com

Ok, just checking. I'll open a feature request on google code.

Reply all

Reply to author

Forward