pll dppdiv stuck or really slow ?

122 views
Skip to first unread message

Teofil Nakov

unread,
Dec 19, 2013, 3:48:55 PM12/19/13
to dppdiv...@googlegroups.com
Hi Tracy and everyone,

I'm trying to run dppdiv on a dataset of 208 taxa and 9.3 kb nucleotide data. I've got 24 calibration points hoping to get divergence times. I've compiled the pll pthreads and mpi versions (from https://github.com/ddarriba/pll-dppdiv).

I call dppdiv like so:

~/bin/dppdiv-pthreads-sse3 -T 12 -in 7g.phy -tre RAxML_bestTree.combined_opt464 -out test -exhp -dphp 10 -ghp -pm 5 -npr 3 -cal diatom.cal

When run with calibrations the program seems to get stuck early on. The last couple of lines of output after 24 hours using 12 processors are:

...
Rate Group Elements: (0.321836) -> 84 189 294 387
Rate Group Elements: (0.223646) -> 305

When run without the calibration points (all exponential) the program runs (takes 15 minutes for 600 iterations).

Not sure if there is a problem with my calibration file (attached) or something else? Also, is there possibly a way to restart the MCMC from the last iteration or a checkpoint (really nice feature when running remotely)?

Appreciate your help!
Teofil
diatom.cal
RAxML_bestTree.combined_opt464
7g.phy

Tracy Heath

unread,
Dec 19, 2013, 5:34:35 PM12/19/13
to dppdiv...@googlegroups.com
Hi Teofil,

It seems like there may be a conflicting calibration or something that is causing the problem or crash early when you have those in there. I will download the files and try to see if I can figure out what the issue is. 

With regard to speed, the DPP model on branch rates can be somewhat slow. That is due to the proposal mechanism on the branch-rate partitioning structure. The speed of the MCMC is dependent on the dataset size and the degree of variation present in the data. If there's a lot of branch-rate variation, then this can lead to longer mixing times. 

Cheers!
Tracy


--
You received this message because you are subscribed to the Google Groups "dppdiv-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dppdiv-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tomas Flouri

unread,
Dec 19, 2013, 5:32:35 PM12/19/13
to dppdiv...@googlegroups.com, Teofil Nakov
Hi Teofil,

I don't have any good news but I'll try to answer your questions.

> ~/bin/dppdiv-pthreads-sse3 -T 12 -in 7g.phy -tre
> RAxML_bestTree.combined_opt464 -out test -exhp -dphp 10 -ghp -pm 5
> -npr 3 -cal diatom.cal
>
> When run with calibrations the program seems to get stuck early on.
> The last couple of lines of output after 24 hours using 12 processors
> are:

The problem you are facing is most probably due to problematic
calibration points. DPPDiv does not validate the input calibration
values and this may lead to invalid age ranges or zero-length branches.
DPPDiv will infinately try to propose values within the invalid interval
and hence stuck. Therefore, I'd suggest that you check your calibration
points manually. A good tip on how to detect the problematic calibration
points quickly is binary search: remove half of the calibration points,
check if DPPDiv proceeds, and accordingly continue by halving the
removed calibrations until you find the problematic one.

> When run without the calibration points (all exponential) the program
> runs (takes 15 minutes for 600 iterations).

I have used DPPDiv on a dataset of similar size (~150 taxa, ~10kb and
~20 cp) and my estimation was around 15 days on a 48-core machine.

> Not sure if there is a problem with my calibration file (attached) or
> something else? Also, is there possibly a way to restart the MCMC from
> the last iteration or a checkpoint (really nice feature when running
> remotely)?

Unfortunately, DPPDiv currently does not support checkpointing. We are
planning to develop a new version of DPPDiv that will support
checkpointing, however I do not know when this will happen.

Tomas

Teofil Nakov

unread,
Dec 19, 2013, 5:57:55 PM12/19/13
to Tomas Flouri, dppdiv...@googlegroups.com
Thanks Tomas, Tracy.
--
Teofil Nakov
Plant Biology Graduate Program
University of Texas at Austin
1 University Station A6700
Austin, Tx, 78712
ph: 512-471-4997

Tracy Heath

unread,
Dec 19, 2013, 6:02:01 PM12/19/13
to dppdiv...@googlegroups.com
Hi Teofil,

I had a look at the runs and it seems like this is the problematic calibration:

-E Actinoptychus_undulatum_CA_HK261 Actinocyclus_subtilis_HK168 85

Without it, things seem to run fine.
Cheers!
Tracy


--

Teofil Nakov

unread,
Dec 19, 2013, 6:06:22 PM12/19/13
to dppdiv...@googlegroups.com
Yep, I noticed that too. Seems an unnecessary ping to the group. Happy holidays!
Reply all
Reply to author
Forward
0 new messages