Hi Erik,
It was a long time since you wrote your response, but I'm going to write an update here of what I did in case it helps anyone else.
The combiner approach was not a solution because essentially the operation I am performing is a huge partitioning of the data with no combining or reduction.
I was already using partitions. I use a pipeline worker rather than a classic worker, but I presume that this doesn't make a big difference here.
The problem I ran into with partitions is that each partition requires a new file and I was quickly hitting the "too many open files". I could probably have worked around this by raising the limits on each of the nodes. Instead I decided that I would simply divide the work into multiple passes over the same dataset. The job(s) is now working just fine.
Cheers,
Giles