Phylogenetic correction of two-block PLS

626 views
Skip to first unread message

Jasmine Croghan

unread,
Dec 21, 2016, 11:40:51 AM12/21/16
to geomorph R package
Hello all,
 I would like some help figuring out how to phylogenetically correct a two-block partial least squares analysis in Geomeorph.  It has been done before by these authors:

Klaczko, J., E. Sherratt, and E. Z. F. Setz. 2016. Are diet preferences associated to skulls shape diversification in xenodontine snakes? PLoS ONE 11:1–12.

Apparently, they altered the normal PLS procedure by using the "evolutionary covariance matrix" rather than the "overall trait covariance matrix"; it was my understanding that the command two.b.pls produced the overall trait covariance matrix and then performed a singular value decomposition on it.  Is there some secret command that allows you to add a phylogeny into the mix?

As far as I can tell, it appears they altered the input matrices of the PLS and then ran the PLS as normal.  I just don't know how.  

My current input matrices are made up of (A) proportional diet data, and (B) 3D landmark coordinates; I also have a non-ultrametric tree in .tre format.

Help, and/or code from someone who has done this before?

Thanks,
~Jasmine

P.S. As an aside, does anyone know why, for a PLS, geomorph reports correlation values rather than covariation values like MorphoJ? 

Mike Collyer

unread,
Dec 21, 2016, 12:52:31 PM12/21/16
to Jasmine Croghan, geomorph R package
Jasmine,

Please use the function, phylo.integration, which does exactly what you hope to do.  This function allows two sets of data to be input, along with a phylogeny. Even though it appears to be used for two modules of the same landmark configuration, it will work with two matrices of different data types. 

Cheers
Mike 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/ce4a70e1-b4b4-4a83-bd36-cba7fad042fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jasmine Croghan

unread,
Dec 28, 2016, 11:07:45 PM12/28/16
to geomorph R package, jasmine...@gmail.com
Hi Mike and those following this thread,

Thank you for you very prompt reply!  A week later and the good news is that I have gotten the code to work!  The bad news is that my results seem to make the opposite of sense (so it goes).   Where as the shapes at the extremes of the line were visibly different from one another in the non-phylogeneticaly corrected PLS, they are not visibly different even with 5x exaggeration on the pPLS.  This is a confusing result, especially since my k-mult was 0.6, indicating that I had relatively little phylogenetic signal in the shape data set.  I would have expected most of the shape differences among the taxa to be preserved in such a case.  I am worried that I am missing something!

This brings me to a related question:  does the phylo.integration function phylogenetically correct the second data matrix (A2) as well as the first?  Perhaps this is a more theoretical problem, and I apologize if it is naive, but should proportional diet data even be phylogenetically corrected?

I understand that these questions/musings are less about the help with geomorph and more about help with morphometrics and PLS analyses, but I figured this is probably one of the best forums in which to pose such questions.  Please do not feel obligated to give me a GMM lesson!

Thank you, kindly, for any and all replies,
~Jasmine


On Wednesday, December 21, 2016 at 12:52:31 PM UTC-5, Michael Collyer wrote:
Jasmine,

Please use the function, phylo.integration, which does exactly what you hope to do.  This function allows two sets of data to be input, along with a phylogeny. Even though it appears to be used for two modules of the same landmark configuration, it will work with two matrices of different data types. 

Cheers
Mike 

Sent from my iPhone

On Dec 21, 2016, at 11:40 AM, Jasmine Croghan <jasmine...@gmail.com> wrote:

Hello all,
 I would like some help figuring out how to phylogenetically correct a two-block partial least squares analysis in Geomeorph.  It has been done before by these authors:

Klaczko, J., E. Sherratt, and E. Z. F. Setz. 2016. Are diet preferences associated to skulls shape diversification in xenodontine snakes? PLoS ONE 11:1–12.

Apparently, they altered the normal PLS procedure by using the "evolutionary covariance matrix" rather than the "overall trait covariance matrix"; it was my understanding that the command two.b.pls produced the overall trait covariance matrix and then performed a singular value decomposition on it.  Is there some secret command that allows you to add a phylogeny into the mix?

As far as I can tell, it appears they altered the input matrices of the PLS and then ran the PLS as normal.  I just don't know how.  

My current input matrices are made up of (A) proportional diet data, and (B) 3D landmark coordinates; I also have a non-ultrametric tree in .tre format.

Help, and/or code from someone who has done this before?

Thanks,
~Jasmine

P.S. As an aside, does anyone know why, for a PLS, geomorph reports correlation values rather than covariation values like MorphoJ? 

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.

Adams, Dean [EEOBS]

unread,
Dec 29, 2016, 6:20:01 PM12/29/16
to Jasmine Croghan, geomorph R package

Jasmine,

 

Yes, phylo.integration accounts for the phylogeny in both datasets (see geomorph:::phylo.pls for the underlying code).

 

As for the Kmult, it is a bit unintuitive to go from a particular Kmult value to expectations on whether or not there should be a large effect on downstream analyses. Yes, your K value is less than 1.0, but how much it differs from expectation under random associations of the data and the phylogeny is the question. If the value is highly significant, then there is evidence of  phylogenetic signal, which will alter downstream analyses that account for the phylogeny.

 

Without knowing your particular dataset, all I can say is that there very well be stark differences between PLS analyses with and without taking the phylogeny into account; indeed, that is the very point of phylogenetic comparative analyses!

 

Hope this helps.

 

Dean

 

Dr. Dean C. Adams

Professor

Department of Ecology, Evolution, and Organismal Biology

       Department of Statistics

Iowa State University

www.public.iastate.edu/~dcadams/

phone: 515-294-3834

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "geomorph R package" group.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.


To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.

Mike Collyer

unread,
Dec 30, 2016, 9:10:01 AM12/30/16
to Jasmine Croghan, geomorph R package
Hi Jasmine,

Dean covered Kmult and mentioned that in phylo.integration, but the “x” and “y” data sets account for phylogeny.  I’ll attempt to answer why both x and y need to be phylogenetically corrected.  I’ll do this while trying not to get too technical, but will gladly offer a technical version for anyone who wishes to view it.

To be precise, in such analyses, one is not phylogenetically correcting data; one is adjusting the association of the the two data sets based on the hypothesized non-independece of the observations, due to phylogeny.  There are multiple ways to do this.  One is solve the PGLS estimation of regression coefficients, which can be multiplied in a specific way by sums of squares and cross-products matrices for both data sets to get a matrix that can be decomposed by singular value decomposition (SVD) as part of the PLS procedure.  Another is to use eigen-analysis to calculate a transformation matrix that can be used to transform both data sets, and then perform SVD on the cross-products between the two transformed data sets.  The third is to first find phylogenetically independent contracts (PICs) for each data set and then do SVD on the cross-products between the PICs.  Each of these will produce exactly the same results.  

The second example is what we use in geomorph and it might give the impression that one data set could be transformed and the other not, but keep in mind that this transformation is only a bit of computational finagling, as the first example is the precise definition of PGLS but is computationally expensive to do say 1,000 times as part of a permutation procedure.  Here is another way to look at it.  If one uses PICs, there are n-1 PICs for n species.  There is no way to perform PICs on one data set and not the other.  Likewise, the D-PGLS procedure (second example) should not be viewed as something that can be manipulated to perform a transformation on only one data set.  There is no partial phylogenetic transformation.

To summarize, with phylo.integration, you are phylogenetically adjusting the integration, not the data used for integration, even if the analytical steps suggest that is the case.

Hope that helps!
Mike


To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.

Jasmine Croghan

unread,
Jan 2, 2017, 10:41:12 AM1/2/17
to geomorph R package
Thank you both, so, so much!  This is such a wealth of information and helps my understanding a great deal!

Two questions for clarification:

Dean, if I read you correctly, you said that even though my k-mult is less than 1, it does not necessarily mean there is low phylogenetic signal.  P=0.001 for that particular analysis.  So, this means there is 'evidence of phylogenetic signal in the data set'.  To interpret further, the particular k-mult value does not necessary indicate the level of phylogenetic signal in the data set, rather, whether or not the k-mult value is significant indicates that phylogenetic signal is present or not present.  Is this correct?   What, then, does the k-mult value indicate?

Mike, so how does this change the interpretation of the association between the matrices? For example, for my diet data I would normally interpret that, as you travel further toward the positive extreme of the line, larger positive values mean that column is increasing while larger negative values mean that column is decreasing.  Does that change because there is a phylogenetic adjustment of the points that form that line?  In other words, when I call the vectors from the pPLS, do they mean the same thing that they did when I input the data?

I'm sorry if any of this is not clear.  Thank you, sincerely, for any and all replies!
~Jasmine

Adams, Dean [EEOBS]

unread,
Jan 2, 2017, 11:08:24 AM1/2/17
to Jasmine Croghan, geomorph R package

Jasmine,

 

The Kappa statistic, both univariate and multivariate, have long been misinterpreted. Under Brownian motion, the expected value of K and Kmult is 1.0.  Values larger than this mean there is greater phylogenetic signal than what one might expect under BM, and less than 1.0 there is less phylogenetic signal than expected under BM. However, the significance testing is not versus a value of 1.0, but rather relative to random associations of data and phylogeny. In other words, one shuffles the data on the tips of the phylogeny and compares the observed K (or Kmult) to that distribution. That distribution of permuted values is not guaranteed to be centered on 1.0, as it depends entirely on the data and the phylogeny. Thus, the significance testing (as described originally for the univariate in Blomberg et al. 2003 and also by me for multivariate Kmult) tests whether the observed K (or Kmult) is greater than one expects given the dataset and the phylogeny.

 

So it is entirely possible that one has significantly greater phylogenetic signal than expected by chance (where chance is defined as the association of data to tips), but still be less than 1.0: the expected value under BM. Of course, one could devise some way of testing the observed K or Kmult against 1.0; much like one could specify a specific value against which a regression coefficient is compared, but to my knowledge no one has devised and tested such a procedure for phylogenetic signal measures (though I’ve had numerous discussions with some individuals on this topic).

 

As to your second question, the logic of the interpretation is basically identical.  With PLS we are describing the association of X and Y. With PPLS, we are describing the association of X and Y while accounting for non-independence due to phylogeny. Where you must be careful however is assuming that positive values on a PLS axis means the underlying variable scores are larger, and negative values means the underlying variable scores are smaller. That is not always the case: either with PLS or PPLS (or PCA or some other types of summary axes for that matter).  The reason is that the positive and negative ‘sides’ of summary axes based on eigen-decomposition (or SVD) are completely arbitrary.

 

The simplest way to think about this is with PCA.  Do a PCA of some data. PC1 defines the direction of greatest variation in the dataset.  There is no natural positive or negative to this: it is just a direction vector. Further, for different computer algorithms for finding PC1, using the same input data you could find the same exact PC1, but the +/- side could be flipped between algorithms. This is not a failing of the math, but rather just a recognition that the sign is arbitrary.  Same thing for PLS axes. So what that means is that one must actually look at the data to determine whether the ‘+’ side of a PCA or PLS vector associates with larger or smaller values of the original variables; one cannot assume it is the case for any dataset.

 

For the case of PLS and PPLS, the important step you should probably include is this last one: take the time to go back and look at your original variables relative to PLS 1 and PPLS 1. This will tell you how to interpret changes along the axis.

 

Best,

 

Dean

 

Dr. Dean C. Adams

Professor

Department of Ecology, Evolution, and Organismal Biology

       Department of Statistics

Iowa State University

www.public.iastate.edu/~dcadams/

phone: 515-294-3834

 

From: geomorph-...@googlegroups.com [mailto:geomorph-...@googlegroups.com] On Behalf Of Jasmine Croghan


Sent: Monday, January 2, 2017 9:41 AM
To: geomorph R package <geomorph-...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.

Reply all
Reply to author
Forward
0 new messages