Out of memory on CIPRES

194 views
Skip to first unread message

Brian D.

unread,
Feb 24, 2017, 6:47:42 PM2/24/17
to beast-users
Hi all,
I've received the error below from each of four replicate runs of *beast2 on CIPRES.
Am I correct that CIPRES only allocates 5MB of memory to these analyses? This seems very low for a starbeast run. If so, does anyone know how to increase the memory? If not, any suggestions why this is happening? I've been running another replicate on my machine for 264M generations with no problem. It is currently using ~437MB.

Thanks!
Brian D.

slurmstepd: Step 7825789.0 exceeded memory limit (5250280 > 5242880), being killed srun: Job step aborted: Waiting up to 302 seconds for job step to finish. slurmstepd: *** STEP 7825789.0 ON comet-06-11 CANCELLED AT 2017-02-23T18:31:12 *** /projects/ps-ngbt/home/cipres/ngbw/contrib/tools/bin/beast2wrapper_2.4.4: line 67: 29954 Killed $cmdline slurmstepd: Exceeded job memory limit at some point. srun: error: comet-06-11: task 0: Killed


Miller, Mark

unread,
Feb 24, 2017, 9:48:07 PM2/24/17
to beast...@googlegroups.com

Hi Brian,

 

We are new to Starbeast at CIPRES, so we can benefit by doing some benchmarking. We just have not had a chance to do it yet.

5GB is typically maximum for one core jobs on Comet. I can make adjustments in the interface to accomodate this issue with more infomration. In the mean time, you can try tricking the interface to give you a multicore run, which will bring more memory. I’m not sure the configuration will work out, since I don’t know Starbeast well yet.  Try telling the interface you have 1 partition and 6,000 patterns. That should kick your run off on 6 cores, which will give more memory.

 

Also, can you please send me the file _jobinfo.txt for the run that ran out of memory?

I can use the information to recover the configuration you used, and in benchmarking experiments.

 

Mark

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Brian D.

unread,
Feb 25, 2017, 2:22:57 PM2/25/17
to beast-users
Hi Mark,
Thanks for the reply. I will try the trick you suggest. Below is the output from one of the _jobinfo.txt files. All four runs terminated in the same way. Let me know if you need all the info files. By the way, despite any bugs/issues its great to have the CIPRES resource and thanks for all the help!

Cheers
Brian D.


Task\ label=noArg1strictEstSubs.Run1
Task\ ID=1113536
Tool=BEAST2_XSEDE
created\ on=2017-02-23 12:22:03.0
JobHandle=NGBW-JOB-BEAST2_XSEDE-E48A8B0E062B432791B373A1CEDF9D78
resource=comet
User\ ID=6812
User\ Name=bdorsey


Output=(all_results,*,UNKNOWN,UNKNOWN,UNKNOWN)
ChargeFactor=1.000000
cores=1
JOBID=7825794

Brian D.

unread,
Feb 25, 2017, 5:12:45 PM2/25/17
to beast-users
Hi again Mark,
I just read your post again and realized you wrote 5GB not 5MB. My analyses are only using ~500MB-1GB on my local machine. Did I miss something or perhaps are the memory limits incorrect? Are the numbers in the error message below in bytes or kilobytes (i.e 5242880 bytes)?

Thanks again,
Brian D.



On Friday, February 24, 2017 at 3:47:42 PM UTC-8, Brian D. wrote:

Miller, Mark

unread,
Feb 25, 2017, 6:00:29 PM2/25/17
to beast...@googlegroups.com

Fair enough, my eye turned the M into a G. The comet machine has 128 GB per 24 cores. Each core gets its own share of that, which should be somewhere around 5GB. If you are running in the shared queue, which you are, there is a possibility that one of the other jobs running with you will compete for your memory, and maybe cause such an issue., But not on all 4 jobs, I wouldn’t think.

There are some experiments we can try to figure this out. It will take a couple of days.

 

From: beast...@googlegroups.com [mailto:beast...@googlegroups.com] On Behalf Of Brian D.


Sent: Saturday, February 25, 2017 2:13 PM
To: beast-users <beast...@googlegroups.com>

Brian D.

unread,
Mar 7, 2017, 4:55:45 PM3/7/17
to beast-users
Just checking if there is any solution for the odd memory limit. My poor machine is grinding away slowly so I would love to be able to use CIPRES for starbeast2 whenever its possible.

Thanks again,
Brian D.

Mark Miller

unread,
Mar 7, 2017, 6:01:28 PM3/7/17
to beast-users
Hey Brian,

The run I made setting it for 6 cores completed successfully. Did you have a different experience?

Mark


On Friday, February 24, 2017 at 3:47:42 PM UTC-8, Brian D. wrote:

Brian D.

unread,
Mar 8, 2017, 5:02:21 PM3/8/17
to beast-users
Hi Mark,
Well I feel a bit dumb. I misunderstood your last post to mean that you needed to run some tests and perhaps the 6 core strategy wouldn't work. My mistake. Glad to hear it worked for you though. I have started another set of runs using your suggestion. Fingers crossed.

Cheers
Brian

Mark Miller

unread,
Mar 8, 2017, 5:06:22 PM3/8/17
to beast-users
NO worries, I'll send you a link to my results.


On Friday, February 24, 2017 at 3:47:42 PM UTC-8, Brian D. wrote:
Reply all
Reply to author
Forward
0 new messages