Task timeout and progress report

141 views
Skip to first unread message

Mateusz Fedoryszak

unread,
Feb 27, 2013, 5:37:41 AM2/27/13
to scoobi...@googlegroups.com
My Scoobi task times-out (Task attempt_201212191044_0180_m_002304_0 failed to report status for 600 seconds. Killing!) Increasing timeout is not an option for me. Low-level Hadoop way to deal with that would be to use counters. Is there any way of accessing them in Scoobi? I was able to find only a Github ticket... Can you think of any other ways of reporting task status?

Eric Springer

unread,
Feb 27, 2013, 10:26:12 AM2/27/13
to scoobi...@googlegroups.com
Is there a reason that 10 minutes elapses without a single record being outputted? If so, instead of of using a counter, you could make a class `Heartbeat` and have your thing output a `Either[Heartbeat, T]` and then filter out the heart beats.

If there's no where that 10 minutes should elapse without progress, there's a possibility that you're hitting a hadoop bug with the combiners. I forgot the exact version of hadoop its in, but you can verify this by replace all the .combine() in your scoobi code with a flatmap that does the same thing

On Wed, Feb 27, 2013 at 2:37 AM, Mateusz Fedoryszak <mfedo...@gmail.com> wrote:
My Scoobi task times-out (Task attempt_201212191044_0180_m_002304_0 failed to report status for 600 seconds. Killing!) Increasing timeout is not an option for me. Low-level Hadoop way to deal with that would be to use counters. Is there any way of accessing them in Scoobi? I was able to find only a Github ticket... Can you think of any other ways of reporting task status?

--
You received this message because you are subscribed to the Google Groups "scoobi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scoobi-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jesse Bridgewater

unread,
Mar 8, 2013, 2:49:46 PM3/8/13
to scoobi...@googlegroups.com
This is an interesting question for me as well since I have been playing with heavy-duty processing in reducers.  Making sure that I can easily send heartbeat from my long-running reducers without making the scoobi code too ugly would be great.

Can you expand on your idea of using Either?

Ben Lever

unread,
Mar 12, 2013, 9:40:38 AM3/12/13
to scoobi...@googlegroups.com
Hi Jesse,

I'll let Eric reply on his idea of using Either, but one experiment that shouldn't be too hard to try out would be to add to Scoobi's Mapper and Reducer classes (MscrMapper.scala, MscrReducer.scala), the generation of frequent heartbeats. If this worked we could think about a configuration option to turn heartbeat generation on and off (off by default I guess).

I don't know off the top of my head what APIs to use to generate the heartbeat, but Google should help :)

Cheers,
Ben.

Jesse Bridgewater

unread,
Mar 12, 2013, 11:19:31 AM3/12/13
to scoobi...@googlegroups.com
Thank Ben!  I think this is a very pragmatic approach and I like that. I am going to give that a try.

I am also exploring some ideas for how to do this using Scalding (Oscar Boykin is thinking about some designs).  If that thread generates any good ideas for how to provide a clean interface then I'll share it here as well.

Eric Springer

unread,
Mar 12, 2013, 12:08:43 PM3/12/13
to scoobi...@googlegroups.com
My idea was just to periodically throw a dummy value through, to keep hadoop happy. e.g. using using either or option or something, and then filter it out later. e.g. of the style


x: DList[T]
x.parallelDo {  new DoFn[T, Option[V]] { 
       override def process(...) { 
          
             (1 to 100).foreach { i =>
                  doWorkPart(i)
                  emit(None)
             }

             emit(Some(realResults))
       }
}
}.map_reduce_boundary.
flatMap(x => x)  // get rid of the Nones


So that as far as hadoop is concerned, data is continually going through. And if you can break up your work like I did in my example, nice, otherwise with a lot of care you can use a thread to emit every 5 minutes or something. 


But a question for the core scoobi developers would be, I assume it's necessary for this data to make it all the way to end of the mapper-or-reducer (e.g. if you immediately filtered it out, it would do nothing). Is there any a good way of doing this? Like maybe DList.groupBarrier has this property ?

Jesse Bridgewater

unread,
Mar 12, 2013, 1:59:41 PM3/12/13
to scoobi...@googlegroups.com
This is a really clever idea. For my use-case I think I will need to go the thread route though. The question is conceptually simple, but the challenge is to not junk up the API. Thanks again for the thoughts on this.


--
You received this message because you are subscribed to a topic in the Google Groups "scoobi-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scoobi-users/g-0uNXsFAFc/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to scoobi-users...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
--
Jesse S.A. Bridgewater
Twitter: @drbridgewater
(408) 660-0738

Eric Springer

unread,
Mar 12, 2013, 3:11:03 PM3/12/13
to scoobi...@googlegroups.com
Massive psuedo-code, untested and uncompiled warning:

https://gist.github.com/espringe/5145955

and in the next map-reduce job, you would use that that .mapFlat trick.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages