Time-stratified binary character mapping?

182 views
Skip to first unread message

Robin van Velzen

unread,
Jul 6, 2017, 1:32:50 AM7/6/17
to BioGeoBEARS
Dear Nick, 

I have a quick question I hope you can answer. I would like to map a binary presence/absence character on a phylogenetic tree. But I suspect that the rates of gain and loss may depend on the geological time period. So I immediately thought of the time-stratified analyses that BioGeoBEARS can do. 

Do you know if any other character mapping program (e.g. simmap, phytools, diverstitree) implements a similar time-stratification? Or would you recommend (ab)using BioGeoBEARS for such non-geographic analysis? 

Any hint or recommendation would be of great help to me. 

Thanks and with best wishes, 

Robin

Nick Matzke

unread,
Jul 6, 2017, 1:51:26 AM7/6/17
to bioge...@googlegroups.com
Hi!

Thanks for posting to the list!  Here's a short answer, we could get more specific if needed.

A binary character model is easy(ish) to set up in BioGeoBEARS, I think the code for this is online somewhere on PhyloWiki.  Basically it's:

- the BAYAREALIKE model
- with 1 area max
- null range off
- d and e fixed to 0
- "a" set to "free". 

Sometimes I call this "BAYAREALIKE+a", but that's a little oversimplified.

Once you have this set up, you could do ML inference on this model like normal, either with a default non-time stratified model, or with a time-stratified model.  

You could get different "rates" for either forwards-vs-backwards, and/or for different time-slices, by semi-creative use of the manual dispersal multipliers, distance matrices, etc.

Biogeographical Stochastic Mapping, or I guess in this case, "non-biogeographical Biogeographical Stochastic Mapping" should then just work on any of these models.  Normally I would only bother running it on your best-fit model, or other key models of interest for your specific scientific question.

Cheers!
Nick


--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears+unsubscribe@googlegroups.com.
To post to this group, send email to bioge...@googlegroups.com.
Visit this group at https://groups.google.com/group/biogeobears.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeobears/602ed45a-067b-4ef8-9e49-d57a791170e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robin van Velzen

unread,
Jul 6, 2017, 2:55:16 AM7/6/17
to BioGeoBEARS, mat...@nimbios.org
Hi Nick, 

Thanks for the reply and recommendations! I will try implementing the model according to your suggestions and come back here if any issues arise. 

Robin


On Thursday, July 6, 2017 at 7:51:26 AM UTC+2, Nick Matzke wrote:
Hi!

Thanks for posting to the list!  Here's a short answer, we could get more specific if needed.

A binary character model is easy(ish) to set up in BioGeoBEARS, I think the code for this is online somewhere on PhyloWiki.  Basically it's:

- the BAYAREALIKE model
- with 1 area max
- null range off
- d and e fixed to 0
- "a" set to "free". 

Sometimes I call this "BAYAREALIKE+a", but that's a little oversimplified.

Once you have this set up, you could do ML inference on this model like normal, either with a default non-time stratified model, or with a time-stratified model.  

You could get different "rates" for either forwards-vs-backwards, and/or for different time-slices, by semi-creative use of the manual dispersal multipliers, distance matrices, etc.

Biogeographical Stochastic Mapping, or I guess in this case, "non-biogeographical Biogeographical Stochastic Mapping" should then just work on any of these models.  Normally I would only bother running it on your best-fit model, or other key models of interest for your specific scientific question.

Cheers!
Nick

On Thu, Jul 6, 2017 at 3:32 PM, Robin van Velzen <robinv...@gmail.com> wrote:
Dear Nick, 

I have a quick question I hope you can answer. I would like to map a binary presence/absence character on a phylogenetic tree. But I suspect that the rates of gain and loss may depend on the geological time period. So I immediately thought of the time-stratified analyses that BioGeoBEARS can do. 

Do you know if any other character mapping program (e.g. simmap, phytools, diverstitree) implements a similar time-stratification? Or would you recommend (ab)using BioGeoBEARS for such non-geographic analysis? 

Any hint or recommendation would be of great help to me. 

Thanks and with best wishes, 

Robin

--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.

Robin van Velzen

unread,
Jul 6, 2017, 9:10:24 AM7/6/17
to BioGeoBEARS, mat...@nimbios.org
Hi Nick, 

I tried implementing the model as per your suggestions. I found some code on the PhyloWiki here: http://phylo.wikidot.com/biogeobears-validation#toc17 and that seems to work OK for setting up the binary model (using only two states). 

So now I am at the stage of "semi-creative use of the manual dispersal multipliers, distance matrices, etc." and I am a bit lost. The model consists of only the range-switching parameter (a) but I would like to estimate separate rates of gains and losses (in biogeographic context I guess those would be the directional rates of range switches). So my ideal model would have two freely estimated parameters instead of one (analogous to the Kimura 1980 versus Jukes Cantor model for DNA sequence evolution). Is there a way to achieve this? 

If there is no way around the single-parameter model, I imagine that it would still be possible to specify the rate of loss to be much higher than the rate of gains by using dispersal multipliers. But I am not entirely sure how to do this. My model has two states: N and S. What should the dispersal multiplier matrix look like if I want S->N to have a much higher rate than N->S? 

(You also mention distance matrices but I am not sure how those could be used in this context). 

Finally, when I set up time-stratification I still get a single estimate for a - not for each time stratum separately. Is there a way to get separate rate switching estimates? 

Any further advice would be very much appreciated.

Thanks! 

Robin



On Thursday, July 6, 2017 at 7:51:26 AM UTC+2, Nick Matzke wrote:
Hi!

Thanks for posting to the list!  Here's a short answer, we could get more specific if needed.

A binary character model is easy(ish) to set up in BioGeoBEARS, I think the code for this is online somewhere on PhyloWiki.  Basically it's:

- the BAYAREALIKE model
- with 1 area max
- null range off
- d and e fixed to 0
- "a" set to "free". 

Sometimes I call this "BAYAREALIKE+a", but that's a little oversimplified.

Once you have this set up, you could do ML inference on this model like normal, either with a default non-time stratified model, or with a time-stratified model.  

You could get different "rates" for either forwards-vs-backwards, and/or for different time-slices, by semi-creative use of the manual dispersal multipliers, distance matrices, etc.

Biogeographical Stochastic Mapping, or I guess in this case, "non-biogeographical Biogeographical Stochastic Mapping" should then just work on any of these models.  Normally I would only bother running it on your best-fit model, or other key models of interest for your specific scientific question.

Cheers!
Nick

On Thu, Jul 6, 2017 at 3:32 PM, Robin van Velzen <robinv...@gmail.com> wrote:
Dear Nick, 

I have a quick question I hope you can answer. I would like to map a binary presence/absence character on a phylogenetic tree. But I suspect that the rates of gain and loss may depend on the geological time period. So I immediately thought of the time-stratified analyses that BioGeoBEARS can do. 

Do you know if any other character mapping program (e.g. simmap, phytools, diverstitree) implements a similar time-stratification? Or would you recommend (ab)using BioGeoBEARS for such non-geographic analysis? 

Any hint or recommendation would be of great help to me. 

Thanks and with best wishes, 

Robin

--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.

Nick Matzke

unread,
Jul 6, 2017, 11:53:06 PM7/6/17
to bioge...@googlegroups.com
Hi!  Glad you've made progress...

Yeah, my semi-creative thing was a bit opaque, I didn't have a lot of time.

Basically, here is was I was thinking:

1. The way that the manual dispersal multipliers and distance matrices work is that they take the base dispersal rate (d, or a) (or the weight j, for j events) and, for a given pair of areas, multiply by the value in the multiplier matrix.  Then likelihoods etc. are calculated conditional on that.

2. The traditional way to do this (e.g. in Ree & Smith 2008) was to just use user-specified dispersal-probability multipliers.  E.g., maybe you would say that 

- dispersal between North and South America had a multiplier of 1 (no change from default

- dispersal between South America and Africa had a multiplier of 0.1

- dispersal between South America and India had a multiplier of 0.01


3. These were -- and everyone would admit this -- basically made up numbers. But, they did express a subjective sense of the relative probability of dispersal between close and far regions.  And, by using a multiplier of 0, you could completely disallow dispersal between regions if desired (e.g. if only allowing dispersal between land-connected regions). So they constitute valid hypotheses/models, at least.


4. In Van Dam & Matzke (2016, J. Biogeog), we proposed using a distance matrix in a similar way:

d_actual = d_base * distance^x
a_actual = a_base * distance^x
j_actual = j_base * distance^x

This is the "+x" model.  By default, x=0, which means the distance matrix (if specified at all) has no effect.  If you set x to be "free", and thus estimated with the other parameters, then:

x<0 means that dispersal rate increases as distance increases (which seems likely!)

x>0 would mean that dispersal rate decreases as distance increases (which seems less likely, I've never seen a case, but perhaps competition or some such could produce this if e.g. areas are adjacent and within-continent or something)

(I recommend people use relative distances rather than literal meters or whatever, just to avoid scaling issues during ML estimation from huge numbers in the multipliers)



5. In Dupin et al. (2017, J. Biogeog), we propose a modification to the manual dispersal multipliers, the +w model, where 

d_actual = d_base * (manual_dispersal_multiplier)^w
...etc...

By default w=1, which means you want to use the manual dispersal multipliers exactly as the user specifies them. (If no manual dispersal multiplier matrix is specified, then the multipliers are just 1, and 1 to any power equals 1.)


6. There is also an "environmental distance" matrix option, this works identically to the +x model, except the parameter n is used. I have used this only in talks at the moment, but people are free to use it.  Obviously this second distance doesn't have to actually be "environmental distance", it can be any distance you think might explain dispersal.


7. Any of 4, 5, and 6 can be time-stratified.

8. It is fairly trivial to add yet more distance/multiplier matrices, this would take a bit of custom code from me however.



So that's all background...

My "semi-creative" idea was, if you want a different rate for timeslice 2 compared to timeslice 1, all you do is set up a time-stratified multiplier matrix:

=========
A B
1 1
1 1

A B
1 0.1
0.1 1

END
=========

(see the help page at phylowiki for file for exact file formats)

...and then make the parameter "w" free.  "w" will have no effect in the top timeslice, since all of the multipliers are 1:

a_actual = a_base * 1^w

...which obviously means

a_actual = a_base

But in the second time slice, the rate will be

a_actual = a_base * 0.1^w

...and you estimate "w" as another free parameter.  Then to get the inferred rate for timeslice 2, you just use the equation above, you use the ML estimates of "a_base" and "w" to calculate a_actual for timeslice 2.


If you need more timeslices with different rates, or you want to model asymmetric rates, just use more distance matrices, except instead of distances just put multipliers of 0.1 for the rate you want to modify.

I suppose all of this is slightly more onerous than a custom system, but it takes a lot less programming.

Also, all of these model variants create nested pairs (w free vs w=1, w free vs. w=0, etc.) that make it easy to make a table of pairwise likelihood ratio tests, or a table of AICc comparisons to compare all the proposed models at once.  

(I recommend reading the books by Burnham and Anderson (2002) or especially Anderson (2008) to get a sense of the philosophy of model choice before creating hundreds/thousands of model variants -- you want to have a reasonably moderate number of scientifically interesting models, not every possible combination of model variants justified by nothing more than uncertainty.  Especially worrisome would be having more model variants than species/data points.)


Hope this helps -- if this is mystifying and/or you need something more that can't be covered by this approach, I'd be opening to discussing further on email.  Ideas that need new programming from me are more in the "if you want a coauthor" category, which may or may not be needed/desired by you/your lab etc.

Cheers!
Nick






To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears+unsubscribe@googlegroups.com.

To post to this group, send email to bioge...@googlegroups.com.
Visit this group at https://groups.google.com/group/biogeobears.

Robin van Velzen

unread,
Jul 7, 2017, 3:26:46 AM7/7/17
to bioge...@googlegroups.com
Hi Nick, 

Many thanks for your detailed explanations. They make perfect sense ! 

[Just one minor thing: when you write "dispersal rate increases as distance increases (which seems likely!)"  I guess you mean the other way around: dispersal rate is likely to DEcrease as distance increases? Just to make sure I am not confused... ]

I think I should be able to implement meaningful hypotheses and models (for my specific purpose) based on the available options. If not, I would be more than happy to discuss a possible collaboration. In any case I will let you know on email how it goes. 

Thanks again & cheers, 

Robin


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "BioGeoBEARS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/biogeobears/5pA9w5zGqa4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to biogeobears...@googlegroups.com.

To post to this group, send email to bioge...@googlegroups.com.
Visit this group at https://groups.google.com/group/biogeobears.

Nick Matzke

unread,
Jul 9, 2017, 1:31:33 AM7/9/17
to bioge...@googlegroups.com
On Fri, Jul 7, 2017 at 5:26 PM, Robin van Velzen <robinv...@gmail.com> wrote:
Hi Nick, 

Many thanks for your detailed explanations. They make perfect sense ! 

[Just one minor thing: when you write "dispersal rate increases as distance increases (which seems likely!)"  I guess you mean the other way around: dispersal rate is likely to DEcrease as distance increases? Just to make sure I am not confused... ]


Hi!  Yeah, I meant to say, if x is <0, then dispersal rate DEcreases as distance increases.  There is a small figure in Van Dam & Matzke 2016 illustrating this dropoff.

Cheers!
Nick

 
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears+unsubscribe@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "BioGeoBEARS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/biogeobears/5pA9w5zGqa4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to biogeobears+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears+unsubscribe@googlegroups.com.

To post to this group, send email to bioge...@googlegroups.com.
Visit this group at https://groups.google.com/group/biogeobears.
Reply all
Reply to author
Forward
0 new messages