Hi Chathura, Supun and all,
Another thing that I noticed with these tests is that when I used hdfs for saving intermediate data, twsiter2 goes out of memory easily.
I cancelled persisting delete tweetIDs and cached them instead. They are small, 1K per worker.
I reduced the number of tweetID-date pairs per worker to 10K.
I used only 4 workers.
But I got java.lang.OutOfMemoryError from all workers.
I attached the logs and put the java heap dumps at the login node of victor:
/scratch_hdd/auyar/heap-dump/java_pid141124.hprof
/scratch_hdd/auyar/heap-dump/java_pid193304.hprof
/scratch_hdd/auyar/heap-dump/java_pid194512.hprof
/scratch_hdd/auyar/heap-dump/java_pid195638.hprof
Can this be related to the other problem?
Ahmet