questions about a few command lines

273 views
Skip to first unread message

mmi...@sdsc.edu

unread,
Jan 24, 2013, 6:36:43 PM1/24/13
to dppdiv...@googlegroups.com
Hi,
 
We have got the code almost ready to run in the cipres interface. I wonder if there is anything written down about the format for the -cal file.
 
Mark

Tomas Flouri

unread,
Jan 24, 2013, 6:43:29 PM1/24/13
to dppdiv...@googlegroups.com
Hi Mark,

running dppdiv with the -hf switch doesn't help?

Tomas

Miller, Mark

unread,
Jan 25, 2013, 12:20:52 AM1/25/13
to dppdiv...@googlegroups.com
Thanks Tomas,

Yes, it does help, of course, thanks. I would like to write up a description for users, so if there was a written description, that would save me some time.
The format seems straightforward from the model file in -hf, if I assume there is no rule about how many blank spaces separate the values in a line, all the numerical values are float (except line 1), and the order of the node specifiers doesn’t matter. (is that right?). Are there tools that create these files, or are they created by hand?

The code seems to accept relaxed phylip format rather than only strict phylip, since introducing white space between the taxa and the characters did not crash the program. (right?)

-npr: what is the "prompt" for the -npr command ( a few words that describe it)?
Is there a default selection for -npr, or is it off by default?
what do cbd and cbd fix stand for?

do these commands apply only when a calibration file is uploaded?
Turn on soft bounds on calibrated nodes (-soft)
All calibrated nodes are offset exponential (-exhp)
All calibrated nodes have a DPM hyperprior (-dphp)
Hyperprior on calibrations from a gamma- (-ghp)

I got seg faults with 4 commands, I am sure I have misconfigured them, perhaps they require a modifier, or have a precondition? I would appreciate any guidance.
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -rnp
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bdr
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bda
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -hsc

Thanks for any help,
Mark


*************************************************************************

Alexandros Stamatakis

unread,
Jan 25, 2013, 12:29:44 PM1/25/13
to dppdiv...@googlegroups.com, Tracy Heath
Hi Tracy,

I believe that it would be a great thing if you could write a short
DppDIV manual.

Cheers,

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Tomas Flouri

unread,
Jan 25, 2013, 1:51:52 PM1/25/13
to dppdiv...@googlegroups.com
Hi Mark,

> Yes, it does help, of course, thanks. I would like to write up a description for users, so if there was a written description, that would save me some time.
> The format seems straightforward from the model file in -hf, if I assume there is no rule about how many blank spaces separate the values in a line, all the numerical values are float (except line 1), and the order of the node specifiers doesn’t matter. (is that right?). Are there tools that create these files, or are they created by hand?
>
I cannot give reliable info on this topic as I'm not the author of the
original DPPDiv, but having a quick look at the source code I would say
that yes, the first line in the calibration file must be an integer
stating the number of entries, and then the number of spaces between
node (taxa) specifiers (which can be arbitrary many) does not matter.
The other numerical values are floats.
I dont think that tools creating such files actually exist. They must be
created by hand.

> The code seems to accept relaxed phylip format rather than only strict phylip, since introducing white space between the taxa and the characters did not crash the program. (right?)
>
On this I can answer with certainty. We have replaced the parsing
procedure in the original DPPDiv with a faster and stable parsing
routine which accepts relaxed (not strict) phylip formats. The parser
can read both sequential and interleaved phylip, but interleaved is
currently disabled in this release. I will enable interleaved format as
soon as I get some time to update the code.
> -npr: what is the "prompt" for the -npr command ( a few words that describe it)?
> Is there a default selection for -npr, or is it off by default?
> what do cbd and cbd fix stand for?
>
Again, I'm also not familiar with these terms, but by inspecting the
source code I can say that:
-npr defines the speciation model. The default value is 1 (uniform) in
case it's not specified by the user.

> do these commands apply only when a calibration file is uploaded?
> Turn on soft bounds on calibrated nodes (-soft)
> All calibrated nodes are offset exponential (-exhp)
> All calibrated nodes have a DPM hyperprior (-dphp)
> Hyperprior on calibrations from a gamma- (-ghp)
>
From the source code I can see that the variables affected by those
switches are only used if a calibration file is provided. So I suppose
they have no effect if calibration is not used.

> I got seg faults with 4 commands, I am sure I have misconfigured them, perhaps they require a modifier, or have a precondition? I would appreciate any guidance.
> dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -rnp
> dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bdr
> dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bda
> dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -hsc
>
>
I will have a look at the code. Would you be able to provide me the
input files you have used?

I hope my answers have helped a bit, but I guess Tracy will give some
better explanation.
Concerning documentation I guess a wiki page could be helpful. I'll ask
Tracy if she can host one.

Cheers,
Tomas

Miller, Mark

unread,
Jan 25, 2013, 7:41:44 PM1/25/13
to dppdiv...@googlegroups.com
Thanks so much Tomas, that was extremely helpful. I appreciate your willingness to look in to the code, that's outside of my abilities right now.

Invoking the commandline flags that pertain to cal files when no cal file is specified seems to cause a seg fault rather than being ignored silently. In fact most commandline errors seem to cause a seg fault. If the code is to be widely adopted, it will save the users of the code much heartache and save the contributers to this list much time answering questions if Tracy can add graceful, informative error handling on command line flag errors.

Before sending you a file, I will wait to see if Tracy comments on the command line flags I mentioned. I think it is the most efficient way to handle it. I just wont expose
These options until they are better understood.

Best,
Mark
--


Tracy Heath

unread,
Jan 26, 2013, 4:19:07 AM1/26/13
to dppdiv...@googlegroups.com
Hi All, I'll add some comments below:


Yes, it does help, of course, thanks. I would like to write up a description for users, so if there was a written description, that would save me some time.
The format seems straightforward from the model file in -hf, if I assume there is no rule about how many blank spaces separate the values in a line, all the numerical values are float (except line 1), and the order of the node specifiers doesn’t matter. (is that right?). Are there tools that create these files, or are they created  by hand?
   
I cannot give reliable info on this topic as I'm not the author of the original DPPDiv, but having a quick look at the source code I would say that yes, the first line in the calibration file must be an integer stating the number of entries, and then the number of spaces between node (taxa) specifiers (which can be arbitrary many) does not matter.  The other numerical values are floats.
I dont think that tools creating such files actually exist. They must be created by hand.

Yes, the first line in the file must state the number of entries. These files must be created by hand, but usually people don't have so many fossils that this should be a problem. There are more details about the calibration file that I will add to another email shortly...



The code seems to accept relaxed phylip format rather than only strict phylip, since introducing white space between the taxa and the characters did not crash the program. (right?)
   
On this I can answer with certainty. We have replaced the parsing procedure in the original DPPDiv with a faster and stable parsing routine which accepts relaxed (not strict) phylip formats. The parser can read both sequential and interleaved phylip, but interleaved is currently disabled in this release. I will enable interleaved format as soon as I get some time to update the code.

Cool! 

-npr: what is the "prompt"  for the  -npr command ( a few words that describe it)?
Is there a default selection for -npr, or is it off by default?
what do cbd and cbd fix stand for?
   
Again, I'm also not familiar with these terms, but by inspecting the source code I can say that:
-npr defines the speciation model. The default value is 1 (uniform) in case it's not specified by the user.

This argument sets the tree prior. The most reasonable one to use would probably "3" for most datasets, this is the reconstructed birth-death model from Gernhard, 2008. 
 


do these commands apply only when a calibration file is uploaded?
Turn on soft bounds on calibrated nodes (-soft)
All calibrated nodes are offset exponential (-exhp)
All calibrated nodes have a DPM hyperprior (-dphp)
Hyperprior on calibrations from a gamma- (-ghp) 
   
From the source code I can see that the variables affected by those switches are only used if a calibration file is provided. So I suppose they have no effect if calibration is not used.

Correct. "-soft" is only for the calibrations specified with uniform densities. The other three are all for the hierarchical model in: http://sysbio.oxfordjournals.org/content/61/5/793 


I got seg faults with 4 commands, I am sure I have misconfigured them, perhaps they require a modifier, or have a precondition? I would appreciate any guidance.
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -rnp
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bdr
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -bda
dppdiv-pthreads -in infile.txt -tre tre2.tre -out out -n 10000 -hsc

   
I will have a look at the code. Would you be able to provide me the input files you have used?

I'm not sure why there was a seg fault for the first one (with -rnp), but for the last three, each of those commands (-bdr, -bda, and -hsc) are looking for a decimal value to set the starting value of the diversification rate (-bdr), relative extinction rate (-bda), or the scale parameter on the gamma hyperprior on the DPP concentration parameter (-hsc)


I hope my answers have helped a bit, but I guess Tracy will give some better explanation.
Concerning documentation I guess a wiki page could be helpful. I'll ask Tracy if she can host one.

Cheers,
Tomas


Thanks for any help,
Mark


*************************************************************************


-----Original Message-----
From: dppdiv...@googlegroups.com [mailto:dppdiv-users@googlegroups.com] On Behalf Of Tomas Flouri
Sent: Thursday, January 24, 2013 3:43 PM
To: dppdiv...@googlegroups.com
Subject: Re: questions about a few command lines

Hi Mark,

running dppdiv with the -hf switch doesn't help?

Tomas

On 01/25/2013 01:36 AM, mmi...@sdsc.edu wrote:
   
Hi,
We have got the code almost ready to run in the cipres interface. I
wonder if there is anything written down about the format for the -cal
file.
Mark
     
   

--



Tracy Heath

unread,
Jan 26, 2013, 4:37:05 AM1/26/13
to dppdiv...@googlegroups.com
Hi All,

I am planning on writing a detailed manual and tutorial in the next month or so. Now that people are actually using the code, I've got the motivation. ;) Currently, it is pretty cryptic (I am realizing) and not very user-friendly.  In the meantime, here's some information about the -cal file...

The -cal file references internal nodes by any 2 tip names that descend from the nodes left and right daughter lineages. There are different flags that you can set before the specification of the tip names -- "-U" indicates a uniform calibration and "-E" specifies an exponential. The exponential can take just a single value:

-E   T1   T5   34.5

Where, in this case you need to choose the option -exhp (and also -dphp with a prior mean on the number of fossil categories), this is only if you have more than 2 fossils and applies the model from the paper: http://sysbio.oxfordjournals.org/content/61/5/793

If you do not use the hyperprior method, then you have to add a mean (in real space) value for the exponential calibration prior:

-E   T1   T5   34.5  -m 46.7

This actually sets the calibration to an exponential with a mean (1/rate) that is equal to 12.2 (e.g. 46.7-34.5), or an exponential rate parameter equal to 1/12.2. 

If you set a -U for the calibration density, you must provide a minimum and maximum age:

-U   T1   T5   34.5   67.2

When you have -U calibrations, you can set the -soft flag to allow for calibrations with soft bounds as in the paper by Yang and Rannala (MBE 2006).

For calibrating using the models described in my paper (http://sysbio.oxfordjournals.org/content/61/5/793), the options for this are "-exhp", "-dphp", and "-ghp". This approach can only be applied if you have multiple fossils (more than 2) where at least 3 have exponential calibration densities. The "-exhp" flag turns on the exponential hyper-prior model. "-dphp" requires you to follow that argument with an integer value indicating how many fossil-rate categories you believe exist--this value must be less than or equal to the number of calibrations in your analysis. "-ghp" sets the base distribution of the hyperprior on the calibrations to a gamma distribution (this is recommended), the default is an exponential. (In the future, I think I'll set this by default.) So, if I have an analysis with 6 fossil calibrations, I would specify "... -exhp -dphp 3 -ghp ..."

Hopefully this helps. I think that ultimately some of the options for execution need to be hidden since they are not useful to most people. Also, I agree that  better warning/error messaging should be applied. I have been working on a new calibration model and will be cleaning up the code soon and we can add these features. 

Thanks to Tomas and Alexis for all the support!
Cheers!
Tracy

Tracy Heath

unread,
Jan 26, 2013, 4:42:15 AM1/26/13
to dppdiv...@googlegroups.com
Sorry for the flood of emails (I'm teaching at a workshop and just got some down-time), but I just tried to run something similar to your command "-in infile.txt -tre tre2.tre -out out -n 10000 -rnp" with one of my test datasets and did not get a seg fault. The -rnp flag just runs the analysis under the prior. So everything should work as long as the input files are formatted correctly. When -rnp is set, it just tells the likelihood function to return a constant--so that all parameters are only sampled in proportion to their priors. Though, I did try this in my current working version and not the release version. 

Cheers!
Tracy

On Sat, Jan 26, 2013 at 10:19 AM, Tracy Heath <tra...@gmail.com> wrote:

alexandre pedro

unread,
Feb 1, 2013, 1:59:06 PM2/1/13
to dppdiv...@googlegroups.com
Dear Tracy,

How can I calculate the values (e.g. the mean) of the Exponential calibrations? I was able to get 16 fossil calibrations, is that manageable for adopting hyperpriors for them? 

Here are two examples, for facilitating any explanations:

Fossil A = 55-52 Ma
Fossil B = 16-7 Ma

Please, tell me also if just uniform priors with soft bounds can provide equally reliable results.

Thanks enormously,

Alex

Tracy Heath

unread,
Feb 7, 2013, 11:29:07 AM2/7/13
to dppdiv...@googlegroups.com
Hi Alexandre,

Sorry it took so long to reply to this message, it got buried in my mailbox. Anyway, with 16 calibrations, I recommend that you use the hyperprior on calibration densities. Thus, you do not have to specify the exponential rate parameters. You do have to specify a prior mean number of hyperprior categories, since you have 16, you can just specify 10. I have found that the posterior samples are quite robust to this parameter. 

For your calibration file, you only need to specify that the prior is exponential and provide a minimum age:

2
-E   T1   T5   52
-E   T1   T3   7

Regarding calibration priors and reliable results: this is entirely dependent on the reliability of your calibrations. Calibrating divergence times is a unique statistical problem, where the prior is providing the information to scale the analysis. So if you have good minimum and maximum ages, then uniform calibrations will lead to reliable estimates of node ages. So, ultimately, the choice of these priors is up to you and which priors you believe best characterize your understanding of the fossils used to calibrate your analysis.  

Cheers,
Tracy
--
You received this message because you are subscribed to the Google Groups "dppdiv-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dppdiv-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Mark Derbyshire

unread,
Mar 12, 2020, 3:49:19 AM3/12/20
to dppdiv-users
Hi Tracy,

Have you put together that manual yet? It's 7 years since this thread was started. Anyhow, I was wondering what it means to run the analysis under the prior? The analysis is very quick when I set -rnp but very slow when I do not.

Cheers,
Mark
Reply all
Reply to author
Forward
0 new messages