Guidance on the "w" Parameter for Estimating Dispersal Multipliers

68 views
Skip to first unread message

Aniket Vaibhav Ranjangaonkar

unread,
Jan 21, 2025, 11:10:57 PMJan 21
to bioge...@googlegroups.com

Dear BioGeoBEARS Users,

I hope this email finds you well.

As mentioned on the PhyloWiki website, an additional option in BioGeoBEARS allows users to set the "w" parameter to be free, enabling the inference of the optimal dispersal_multiplier matrix.

I would like to ask:

  1. What are the potential issues or limitations with this approach?
  2. What changes should we make to the code to implement this, starting with:
    BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["w","type"] = "free"
    
    Could someone kindly share the complete code for this implementation?

Additionally, how do you typically decide the values for a manual dispersal multiplier matrix? For example, should the base dispersal multipliers be set as 1, 0.5, and 0.1 (for easy, medium, and hard dispersal), or as 1, 0.5, and 0.001? I would appreciate insights on the rationale for choosing such values.

Thank you in advance for your guidance.

Best regards,
Aniket Vaibhav Ranjangaonkar
Biodiversity, Biogeography, and Systematics Lab
Fifth Year (Integrated MSc.)
School of Biological Sciences

Nick Matzke

unread,
Jan 22, 2025, 4:29:42 PMJan 22
to bioge...@googlegroups.com
Hi! Brief answers...


On Wed, Jan 22, 2025 at 5:10 PM 'Aniket Vaibhav Ranjangaonkar' via BioGeoBEARS <bioge...@googlegroups.com> wrote:

Dear BioGeoBEARS Users,

I hope this email finds you well.

As mentioned on the PhyloWiki website, an additional option in BioGeoBEARS allows users to set the "w" parameter to be free, enabling the inference of the optimal dispersal_multiplier matrix.

I would like to ask:

  1. What are the potential issues or limitations with this approach?
  2. What changes should we make to the code to implement this, starting with:
    BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["w","type"] = "free"


Yes, that's right.  The other change is to un-comment the line that specifies the dispersal multiplier matrix.
 
  1. 
    
    Could someone kindly share the complete code for this implementation?

Additionally, how do you typically decide the values for a manual dispersal multiplier matrix? For example, should the base dispersal multipliers be set as 1, 0.5, and 0.1 (for easy, medium, and hard dispersal), or as 1, 0.5, and 0.001? I would appreciate insights on the rationale for choosing such values.


The "w" parameter is intended as a solution, or partial solution, to that problem.  The dispersal-multipliers approach goes back to the original Lagrange program and its DEC model.  The weakness has always been that there was no objective way to set those multipliers.  The method was "pull them out of your kiester."

(I guess if you were specifying a model that declared that dispersal was impossible, ie multiplier 0, between 2 areas, based on something like connectivity, that is objective.)

With the +w model variant, the actual multiplier on dispersal rate works like this:

d_actual = d_base * (dispersal_multiplier)^w

By default, w=1, which means that the dispersal multiplier between each pair of areas is read literally.  But if w is free, then it can vary.  Once w=0, any (dispersal_multiplier)^0 = 1.0, which means the dispersal multiplier has no effect.

So, by varying w, the model can infer what the best-fitting value of w is, and is therefore inferring what the best dispersal multiplier is.

The +x and +n model variants work exactly the same way, they are just called the distances_matrix and the environmental distances matrix.  If you use some measure of actual geographic distance (or average-mean-temperature-difference or something), then those are objective, and by varying w or x or n you are discovering what effect those distances have on dispersal.

That said, if you are just setting manual dispersal multipliers, then 

* if you have just 2 multipliers, 1.0 and 0.5 (or whatever, the fraction doesn't matter), then optimizing by w is effectively just inferring 2 different rates, d and d*(0.5^w), for your 2 different categories.  Same for 1.0, 0.0, and 0.5 (or whatever).

* But, if you have 3 different dispersal multipliers, 1.0, 0.5, and 0.1, then you still have the question of where the difference between 0.5 and 0.1 came from, and what that is based on.  If it is based on distance or something else measurable, great.  If it came from your kiester, fine, and you can still test whether your kiester's guess is explanatorily useful by optimizing w, but it is an extra subjective thing to explain to reviewers.  Regardless, my view is it is far better than just making up dispersal multipliers by intution and not testing them at all.

Cheers,
Nick

 

Thank you in advance for your guidance.

Best regards,
Aniket Vaibhav Ranjangaonkar
Biodiversity, Biogeography, and Systematics Lab
Fifth Year (Integrated MSc.)
School of Biological Sciences

--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/biogeobears/CAKSQeKkVmGsLH3k7AUoeNEd1HV8Vh9gRbQX4o2qFwXYJHpkmpg%40mail.gmail.com.

Aniket Vaibhav Ranjangaonkar

unread,
Jan 23, 2025, 5:17:19 AMJan 23
to bioge...@googlegroups.com
Dear Dr. Nick Matzke,  

Thank you very much for your detailed explanation; I truly appreciate the time you took to clarify these concepts. However, I have a few follow-up questions I would like to ask.  

1. If we want to use a distance matrix, will setting the parameter "w" as free, while keeping "x" and "n" fixed, still allow the model to estimate the effect of distance on dispersal?  
2. Could you elaborate on the different uses of the "x" and "n" parameters? If this is covered in the BioGeoBEARS documentation, I would be grateful if you could point me to the relevant sections.  
3. Lastly, if I use a distance matrix to specify the distances between areas, does that eliminate the need for a manual dispersal matrix?  

Thank you again for your insights, and I look forward to your response.  

Best regards,  
Aniket



Aniket Vaibhav Ranjangaonkar

unread,
Jan 23, 2025, 5:23:33 AMJan 23
to bioge...@googlegroups.com

I have a couple of additional questions I’d like to clarify:

  1. When you mention, "There is no particular reason that a power function (distance^x) is going to be the best dispersal-distance function. Others should be tried (but would take new programming)," could you elaborate on what other functions might better explain the relationship between distance and dispersal? Additionally, what would be the potential approaches to integrate such a function into the BioGeoBEARS framework?

  2. I am also curious to understand more about the inner workings of BioGeoBEARS. Specifically, how the parameters influence the ancestral range. At the moment, the process feels somewhat like a black box—where I provide the phylogeny and various matrices, and the framework produces the taxon-area cladograms. Have you described these mechanics in any of your publications? If so, I would greatly appreciate it if you could point me to the relevant papers.

Nick Matzke

unread,
Jan 23, 2025, 4:07:37 PMJan 23
to bioge...@googlegroups.com
On Thu, Jan 23, 2025 at 11:17 PM 'Aniket Vaibhav Ranjangaonkar' via BioGeoBEARS <bioge...@googlegroups.com> wrote:
Dear Dr. Nick Matzke,  

Thank you very much for your detailed explanation; I truly appreciate the time you took to clarify these concepts. However, I have a few follow-up questions I would like to ask.  

1. If we want to use a distance matrix, will setting the parameter "w" as free, while keeping "x" and "n" fixed, still allow the model to estimate the effect of distance on dispersal?  


If you have distances (or ideally, relative distances, i.e. measure distances in km, then divide everything by the maximum distance, so the maximum relative distance is 1.0) in the manual dispersal multipliers matrix, then yes, you are using the manual dispersal multipliers and the w parameter to estimate the effect of distance on dispersal.

Similarly,

* If you have distances (or ideally, relative distances) in the distances matrix, and the x parameter free, then you are using the distances matrix and the x parameter to estimate the effect of distance on dispersal.

* If you have distances (or ideally, relative distances) in the envdistances matrix, and the n parameter free, then you are using the distances matrix and the n parameter to estimate the effect of distance on dispersal.

All 3 work identically, it doesn't fundamentally matter which you use, they were just named this way by the most obvious 3 uses I could thing of.

If you want, you could use all 3 at once, and have
* w used to multiply d * (differences in mean precipitation)^w
* x used to multiply d * (differences in percentage of Spanish speakers)^x
* n used to multiply d * (differences in the number of McDonalds restaurants)^n

The result would be that d_actual = d_base *  (differences in mean precipitation)^w * (differences in percentage of Spanish speakers)^x * (differences in the number of McDonalds restaurants)^n

If parameters w, x, and n are all estimated at 0.0, then you have discovered that none of those distance matrices is explanatory for dispersal rates.  

Obviously, this wouldn't make much biological sense, and as a result you would expect such "distances"/multipliers to not improve the statistical fit, but they are computationally possible things to do. 

Everything is just models, statistical model comparison lets you see which models fit better on a given dataset.  The recommended procedure is to think about the *biology* and geography, and devise some plausible hypotheses, implement those hypotheses as models, then statistically compare those models via maximum likelihood and AIC.

 
2. Could you elaborate on the different uses of the "x" and "n" parameters? If this is covered in the BioGeoBEARS documentation, I would be grateful if you could point me to the relevant sections.  

See above
 
3. Lastly, if I use a distance matrix to specify the distances between areas, does that eliminate the need for a manual dispersal matrix?  
 
Yep -- cheers, NIkc

 

Nick Matzke

unread,
Jan 23, 2025, 4:20:01 PMJan 23
to bioge...@googlegroups.com
On Thu, Jan 23, 2025 at 11:23 PM 'Aniket Vaibhav Ranjangaonkar' via BioGeoBEARS <bioge...@googlegroups.com> wrote:

I have a couple of additional questions I’d like to clarify:

  1. When you mention, "There is no particular reason that a power function (distance^x) is going to be the best dispersal-distance function. Others should be tried (but would take new programming)," could you elaborate on what other functions might better explain the relationship between distance and dispersal? Additionally, what would be the potential approaches to integrate such a function into the BioGeoBEARS framework?



Hi -- I just meant that using the power function is an obvious starting point, but one could imagine other functions of distance. 

The power function works like this -- 

Van Dam, Matthew; Matzke, Nicholas J. (2016). Evaluating the influence of connectivity and distance on biogeographic patterns in the south-western deserts of North America. Journal of Biogeography. 43(8):1514–1532.

image.png

I.e, if x=-1.0, then doubling the distance halves the dispersal rate 

If x=-2.0, then doubling the distance quarters the dispersal rate 

This is pretty plausible.  But, for all we know, maybe there is a clade of bats in a region where successful dispersal & colonisation has a low probability within 500 km due to competition, then a much higher probability out to 1000 km, and then actually 0.0 probability past 1000 km due to the flight limitations of those bats.  More simply, perhaps a lognormal function or something would be a better fit.

These kinds of ideas could be programmed with some effort, but I am dubious about whether or not it would be worth it.  Biogeography datasets are typically small (dozens to 100s of tips), and therefore can't support many free parameters to describe complex functions, so my instinct is to keep models simple and try to capture the 1st approximation of what one thinks the many influences on dispersal are. 


  1. I am also curious to understand more about the inner workings of BioGeoBEARS. Specifically, how the parameters influence the ancestral range. At the moment, the process feels somewhat like a black box—where I provide the phylogeny and various matrices, and the framework produces the taxon-area cladograms. Have you described these mechanics in any of your publications? If so, I would greatly appreciate it if you could point me to the relevant papers.


The phrase "taxon-area cladograms" goes back to cladistic biogeography, which is a pretty different framework than the model comparison one.  The models in BioGeoBEARS and other programs are really just modified versions of the models used for DNA or other discrete characters -- there is a transition matrix, Q, describing how probabilities change along branches, and the special thing is a cladogenetic range-change-at-speciation table, describing the probabilities of different ancestor->left,right range changes at speciation.  The BioGeoBEARS core code for likelihood calculation and maximum estimation was literally modified (a ton) from what was originally the "ace" (ancestral character estimation) function in the APE package. 

The best explainer is probably:

Matzke, Nicholas J. (2022). Statistical comparison of DEC and DEC+J is identical to comparison of two ClaSSE submodels, and is therefore valid. Journal of Biogeography, 49(10), 1805-1824. doi: 10.1111/jbi.14346 OSF doi: 10.31219/osf.io/vqm7r

Cheers!
Nick



 

Aniket Vaibhav Ranjangaonkar

unread,
Jan 23, 2025, 8:49:02 PMJan 23
to bioge...@googlegroups.com
Dear Dr. Nick Matzke,

Thank you for your detailed explanation. I appreciate your time and the clarity you provided—it has been very helpful.

Best regards,
Aniket




Reply all
Reply to author
Forward
0 new messages