Unexpected termination of RaxML at Cipres

42 views
Skip to first unread message

이승현

unread,
Jan 8, 2018, 3:36:48 AM1/8/18
to raxml
Dear Members,

I have a problem using RaxML at Cipres portal.

I've used RAxML-HPC2 on XSEDE without any problem until recently.

However, some ML analysis on Cipres are terminated unexpectedly since yesterday.

When looking into the RaxML_info.result, the log messages are like below.


Overall Time for 102 Rapid Bootstraps 345.735704 seconds
Average Time per Rapid Bootstrap 3.389566 seconds

Starting ML Search ...

Fast ML optimization finished

Fast ML search on Process 0: Time 416.030319 seconds

Fast ML search on Process 3: Time 501.814788 seconds

Fast ML search on Process 1: Time 503.374405 seconds

Fast ML search on Process 2: Time 518.009467 seconds

Fast ML search on Process 5: Time 559.679391 seconds

It seems very weird because usually when fast ML search ends, it directly get into next step, which is Slow ML search.

The task terminates there without any message.

And of course, bipartition.result file is missing in outputs.


Is it because of the data size? 

I re-run the analysis with same parameter, same version of RaxML, but smaller amount of data(Sequence length, No. of taxa) and it just went well.


Please let me know why this happens and how I can fix it.




Thank you all in advance.

Mark Miller

unread,
Jan 8, 2018, 11:04:29 AM1/8/18
to raxml
Hi, thanks for reporting that. For a run at CIPRES that fails, it is usually best to report it to me directly or the bug tracker on the CIPRES site. If you send me the _jobinfo.txt file I can track it donw, and see what I can learn.

Best,
Mark

이승현

unread,
Jan 8, 2018, 7:11:08 PM1/8/18
to raxml
Dear Mark.

Thank you for the immediate reply.

I attached two files.

One is succeeded file, the other is failed.

Sincerely, Seunghyun.

2018년 1월 9일 화요일 오전 1시 4분 29초 UTC+9, Mark Miller 님의 말:
Success_JOBINFO_wng.TXT
Failed_JOBINFO_bycids.TXT

Mark Miller

unread,
Jan 9, 2018, 9:46:06 AM1/9/18
to raxml
Thanks for sending that.
Ah, I see. The failed job is a little longer than the succeeded job. And it ran up against the time limit. By default each raxml job with this interface is set to run a max of 15  minutes.
This used to ensure the job would start more quickly, but now it has less impact.
You can open the parameter pane, and increase the max time allowed to any number up to 168 hours.

The way you can tell you reached the time limit is to look at the file scheduler_stderr.txt. You will see a message like this:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     3    0     3    0     0     10      0 --:--:-- --:--:-- --:--:--   272
slurmstepd: *** JOB 13756334 ON comet-02-08 CANCELLED AT 2018-01-08T00:01:52 DUE TO TIME LIMIT ***

Let me know if I can help further.

Mark

On Monday, January 8, 2018 at 12:36:48 AM UTC-8, 이승현 wrote:

이승현

unread,
Jan 10, 2018, 12:22:40 AM1/10/18
to raxml
Dear Mark.

It works. 

Thank you for answering!!!

Have a nice day:)

2018년 1월 9일 화요일 오후 11시 46분 6초 UTC+9, Mark Miller 님의 말:
Reply all
Reply to author
Forward
0 new messages