Jobs fail due to 'resource temporarily unavailable'

4 views
Skip to first unread message

eric.fredine

unread,
Oct 23, 2009, 1:54:54 PM10/23/09
to Disco-development
In disco-worker a job fails with IOError: [Errno 35] Resource
temporarily unavailable.

I'm running a single (local) node on Mac OS/X with 8 cores. I have 8
map tasks each of which is provided with an equal length input chunk
(about 14M records in each chunk). The map tasks generate large
partitions (10M+ records - kind of pathological I know). All of the
map tasks finish at about the same time. One of the map tasks
inevitably ends up failing with the above IOError. (I'm speculating
this is because it is trying to access some file at the same time as
another process or something of that sort?).

I can do a really ugly hack/work-around by staggering the start-time
of the tasks in disco-worker, by putting something like the following
at the start of disco-worker (each task waits 30 seconds:

time.sleep(int(part)*30)
util.msg("Waited %i seconds -- starting now" % (int(part)*30))

Bleech!

But, this runs to completion. With smaller partitions (more rational
map jobs, a combiner function, etc.) this doesn't happen - at least
not consistently (it still seems to happen sometimes).

Thanks,
Eric
Reply all
Reply to author
Forward
0 new messages