We've hit this problem before, and the common issue seemed to be that Azkaban 'ran out of memory'. This ended up being a red herring because the real cause was that our machines had proc ulimit set really low. On our linux machine, with this low limit, we'd constantly use up all the threads and the default error is an OO error.
There's another issue which I'm looking into that has to do with the scheduler, though I haven't looked into it too deeply yet.