On Wed, Oct 31, 2012 at 12:18:31PM -0400, Matthew Gardner wrote:
>
> In the hadoop code I've been running, the counters are frequently in the
> hundreds of millions, sometimes in the billions. Would you suggest logging
> output that large and using grep to process it? Is there some bottleneck
> in mrs (or map reduce generally) that makes the two equally efficient? And
> even if they are, having the counters as objects in the code lets you
> monitor them as the job is going, so you can (for instance) get an early
> warning if something is going poorly, so you don't have to wait for a long
> job to finish to know you need to fix something.
I was thinking something that made sense to me at the time I wrote the
email, but which makes less sense to me now that I'm thinking about it
in more detail. In short, there are occasional circumstances where grep
is adequate, but you're correct that this isn't always the case.