exception when processing 200M tweets per worker

0 views
Skip to first unread message

Ahmet Uyar

unread,
Jul 28, 2020, 8:46:12 AM7/28/20
to Twister2
Hi guys,

I am running scalability tests with twister2. 
I am running 240 workers on 10 nodes. Each node is running 24 workers. 

I first generated 200M tweetID-date pairs for each worker. 
In total, we have 200Mx240= 48billion tweetID-date pairs. 

I am starting all workers with 4096MB of memory.
They are supposed to read the input files, partition the data, and persist it. 
Workers read the first 10M tweetID-date pairs without any issues. 
However, before finishing to read 20M tweetID-date pairs, they throw out of memory exception. 

I am attaching the logs of one of the workers: 
The program that I run is at: 

thanks,

Ahmet
  
worker-214.log.0

Supun Kamburugamuve

unread,
Jul 28, 2020, 11:09:21 PM7/28/20
to Ahmet Uyar, Twister2
Hi Ahmet,

Could you take a heap dump when it goes OOM. There is a flag that we can pass to the JVM command to do this.

Best,
Supun..

--
You received this message because you are subscribed to the Google Groups "Twister2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister2+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/twister2/CAPBRfYfe%2B2ATqzRd%3DNYLoX%2BBG0_hJoPbEeu2nUHZ5%2BBEdT9eXw%40mail.gmail.com.


--
Supun Kamburugamuve, PhD
Digital Science Center, Indiana University
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun@apache.org;  Mobile: +1 812 219 2563


Ahmet Uyar

unread,
Jul 29, 2020, 6:23:08 AM7/29/20
to Supun Kamburugamuve, Twister2
Hi Supun,

I ran it with the heap dump turned on. 
I put the generated heap-dump files under: /scratch_hdd/auyar/heap-dump/ at the login machine of victor. 
the files are kind of big. 
I also put the log files under the same directory. 
worker-175 has thrown the OOM exception. 

thanks,

Ahmet

Reply all
Reply to author
Forward
0 new messages