Hey,
Is anyone heavily using/testing master?
I assume so, but after updating our local Spark build today (previously we were on an old 0.7-ish snapshot), I'm seeing a lot of "too many file open" exceptions.
There was a post about this back in January, where Matei suggested setting ulimit -n 16,000 in spark-env.sh, which I've done, although I think that is actually lower than the default ulimit on the machine, which was 32,000.
I noticed a few large-ish changes in Spark; could they be leaking files/connections? Any hints about where to poke around? Or should I try looking somewhere else?
This seems to be happening fairly quickly, within the first 20 minutes of the job running.
Thanks!
- Stephen