I'm not sure what a "large number" is in this case :) My specific job
is generating thousands of keys, each with thousands of values. My
number of keys or values per key should not exceed 10,000 in my job,
and the total number of key value pairs may approach but should not
exceed 100,000,000. Does that seem large? :)
I've successfully re-run my job with less than 5% of the original data
(10 input files rather than 225).
In all likelihood my data size is just too large for a single, small
instance.
I'm currently spinning up a second worker role instance... I'll keep
tinkering with this.
-Mike