"compare.multi.evol.rates" function - need a way to convert data in Excel format into necessary 3D array

153 views
Skip to first unread message

Derek Woller

unread,
Jul 25, 2017, 1:20:27 AM7/25/17
to geomorph R package
Hello again! I am attempting to run the function "compare.multi.evol.rates" using the latest guide (p.17-19): https://cran.r-project.org/web/packages/geomorph/geomorph.pdf. I am able to get it to work just fine using the example data "plethspecies" and I think I understand how to do it with my much larger dataset. However, I am getting stuck on converting my data correctly into the required array format and I'm sure there's a nifty way to do it. I have already run GPA on 6 distinct shapes and calculated the means of each for the 31 specimens I am analyzing (10 specimens/species). To make things easier for me to keep track of, I ran each analysis separately in RStudio (on Windows 10) and exported the means data for each into Excel format. I now want to construct a super matrix for all of the data and then convert it into an array within RStudio, so it can be properly used for the function I am working on. 

Using the plethspecies example to guide me, I decided to export "plethspecies$land" into .xlsx format (ATTACHED) (I also tried using .csv format) in order to reverse-engineer the file for my purposes. So far, so good, but when I try to re-import this file into RStudio and use it in the function "compare.multi.evol.rates" it no longer works, presumably because it is no longer a 3D array or an array at all and I can't seem to find a way to put it back into its proper format. I'm fully aware of the ways to convert 2D arrays to 3D and back, but I just can't get the data into a functioning array despite trying multiple permutations of the function "array". 

According to the geomorph guide, a matrix WILL work, but, so far, I have not been able to be clever enough to make this option work either: "The shape data may be input as either a 3D array (p x k x n) containing GPA-aligned coordinates for a set of species, or as a matrix (n x [p x k]) whose rows correspond to each species. In both cases, species names must be provided as rownames (for a matrix) or as the names of the third dimension of the array. Landmark groups for each trait are then specified by a factor array designating which landmark belongs to which trait".

Any guidance is welcome and if I can be shown how to turn the attached data example into a functioning array, I can probably take things from there.

Many thanks!

-Derek
plethspecies_test.xlsx

Emma Sherratt

unread,
Jul 26, 2017, 1:31:24 AM7/26/17
to geomorph R package
Dear Derek,

You are making life very difficult for yourself by moving in and out of R; all of this is simply done within the R environment. Furthermore, by moving in and out of R what you've probably encountered is an issue of reading in tables and making sure that the data are numerical prior to analysis. I suspect this to be one of your issues.

But without knowing what you are doing exactly it is hard to troubleshoot.

Emma

Derek Woller

unread,
Jul 26, 2017, 1:47:41 AM7/26/17
to geomorph R package
Hello, Emma! I understand and I probably just need more practice with R. As I said, though, I already had all of the analyses and data completed and exported in Excel form, so, to me, it seemed the most logical and expedient to combine the data in Excel, import it to R, and manipulate it into an array. Otherwise, I will have to go through the entire GPA process again for all 6 shapes and then find a clever way (which I am sure there is) to concatenate the 6 sets of mean shape data into a single 3D array.

As far as I can tell, the data are numerical. I.e., I can read the table just fine and see the values. Is there a test for whether or not data is numerical? 

If you wish to try to troubleshoot further, I am more than happy to prove any more details you need, although I am not sure what else I can say that wasn't in my original post. To reiterate, I just want to find a way to transform the Excel data I attached (based on "plethspecies$land") previously and transform it into a 3D array that can be read for argument "A" in the function "compare.multi.evol.rates(A, gp, phy, Subset = TRUE, iter = 999, print.progress = TRUE)". If I can figure it out for the example data, I can do it for my own data.

Thank you kindly in advance, and thank you for the geomorph package!

-Derek

Adams, Dean [EEOBS]

unread,
Jul 26, 2017, 11:57:09 AM7/26/17
to Derek Woller, geomorph R package

Emma is correct. You are making this far more difficult than it needs to be, and your pipeline of going in and out of R is very prone to error (both I/O and user-based). A few comments on what should be done in geomorph:

 

First, all specimens should be subjected to a single GPA, not separate ones (as your post implied). Otherwise, the data are in incommensurate rotations in shape space.

 

Second, finding the mean shape for each group may be accomplished in myriad ways in R.  Some are more ‘blunt stick’ approaches while others are more statistically elegant. Some simple examples include using mshape() on each set of aligned specimens, and some sort of loop or tapply function.  The more general and elegant approach is to use a linear function of shape ~ groups and extract the means. This is done as:

 

means <- aggregate(two.d.array(Y.gpa$coords) ~ group, FUN=mean)

 

where Y.gpa$coords is the GPA-aligned shape data and group is the vector designating to which group each specimen belongs. Here the function two.d.array() is used to turn the 3D array into a 2D matrix.  Then, if one wishes to ‘put it back’ into a 3D array, the arrayspecs() function will do this.

 

Again I note that all of this is described in the geomorph manual, and does not require going in and out of R, which is prone to errors. 

 

Dean

 

Dr. Dean C. Adams

Professor

Department of Ecology, Evolution, and Organismal Biology

       Department of Statistics

Iowa State University

www.public.iastate.edu/~dcadams/

phone: 515-294-3834

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/d9888a55-fb1a-4259-a728-c4c86761e9c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Derek Woller

unread,
Jul 26, 2017, 12:46:54 PM7/26/17
to geomorph R package, asi...@gmail.com
Dean, thank you for your reply. I apologize for not being as clear as I hoped to be in my first two posts. To clarify, I have 6 sets of data based on 6 different shapes. Each data set contains landmark coordinates for around 300 specimens (some vary slightly), representing ~10 specimens each for 31 species. I ran a GPA on each shape separately because I figured running it as a single dataset would be prone to errors since the TPS input files are separate and the scale bars are unique for each.

Are you saying it's possible, even preferable, to combine these into a single TPS and input it into R, thus creating a combined set of landmark coordinates for all 6 shapes? Then, I would run a GPA analysis on all simultaneously? If so, is there an elegant way to calculate means for the 6 shape sets within the data block? I have already previously followed your guide to calculate means within R for the 6 shapes separately, using the same "aggregate" suggestion you included in your post here. I only use the export-to-Excel function to keep track of all of the results from all analyses I'm performing.

I sincerely appreciate your assistance.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.
To post to this group, send email to geomorph...@googlegroups.com.

Mike Collyer

unread,
Jul 26, 2017, 1:14:36 PM7/26/17
to Dean Adams, Derek Woller, geomorph R package
Derek,

For finding group means, there is an example in the coords.subset help guide to do that without having to force the coordinates into a 2D matrix (keeps them in a 3D array), if that is of interest.  This is a new example, so you would have to make sure you recently updated geomoprh.  

Mike

Derek Woller

unread,
Jul 26, 2017, 1:16:43 PM7/26/17
to Mike Collyer, geomorph R package, Adams, Dean [EEOBS]
Okay, I'll try to take a look tonight. I'll make sure I have the latest version as well.

Thank you, Mike!

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.
To post to this group, send email to geomorph-r-package@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.
To post to this group, send email to geomorph-r-package@googlegroups.com.

Adams, Dean [EEOBS]

unread,
Jul 26, 2017, 1:30:49 PM7/26/17
to Derek Woller, geomorph R package
Derek, 

Separate tips files is not an issue. My comment on combining data for gpa  was for the same shapes (e.g. all faces). One would not combine heads and arms in a gpa.

Dean



Sent from my Verizon, Samsung Galaxy smartphone
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.

Mike Collyer

unread,
Jul 26, 2017, 1:31:04 PM7/26/17
to Derek Woller, geomorph R package
If the six landmark configurations are the same (same number of landmarks in same dimensions), one can combine landmarks easily as follows:

LM.all <- array(c(LM1, LM2, LM3, LM4, LM5, LM6), c(p, k, n))

where p is the number of landmarks, k is the number of dimensions, and n is the number of specimens, equal to n1 + n2 + n3 + n4 + n5 + n6, and LMi refers to the landmarks of set i.

Mike


To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.

Derek Woller

unread,
Jul 26, 2017, 1:36:03 PM7/26/17
to Adams, Dean [EEOBS], geomorph R package
No worries, Dean, I figured as much, BUT I have been known to be dense when it comes to R, so I thought I might actually be going about this the wrong way from the start.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.
To post to this group, send email to geomorph-r-package@googlegroups.com.

Derek Woller

unread,
Jul 26, 2017, 1:41:17 PM7/26/17
to Mike Collyer, geomorph R package
Mike, a good idea, but I should have added that the # of landmarks for each shape differ, so I suppose this won't work? However, there must be a way to do it since that's how the "plethspecies$land" data set seems to be arranged - 5 landmarks for 1st set stacked on top of 6 landmarks for second set unless I'm entirely misunderstanding how to build the input array for the argument "gp" in the function "compare.multi.evol.rates(A, gp, phy, Subset = TRUE, iter = 999, print.progress = TRUE)", which is entirely possible.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsubscribe@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.
To post to this group, send email to geomorph-r-package@googlegroups.com.

Mike Collyer

unread,
Jul 26, 2017, 2:10:27 PM7/26/17
to Derek Woller, geomorph R package
GPA has to be performed on the same number of landmarks (and plethspecies$land has the same number for each specimen).

You can run GPA on each “shape”, use coords.subset + mshape to get means (see coords.subset example), and then use combine.subsets, provided the means are in the same order, to concatenate the six shapes into one set.  This function can also scale the six shapes to be proportional in size, in case one shape is small and another is large, for example.  Then when you use compare.multi.evol.rates, you just have to be careful to make sure that the 6 landmark number designations are accurate.

One caveat, you should also get mean centroid sizes to use in combine.subsets, if you want to appropriately scale configurations.  As you are going from 10 shapes each to 1 per species, the typical workflow is different than if you are working at the individual level.  (In that case, combine.subsets would be easier, as it can perform GPA.  That is not an option for you.)  Hopefully the help files are each helpful to point you in the right direction.

Mike

Derek Woller

unread,
Jul 26, 2017, 2:30:23 PM7/26/17
to Mike Collyer, geomorph R package
Mike, maybe I'm using the wrong terminology to explain what I'm thinking - regarding "plethspecies$land", I understand that the same # of landmarks (based on coordinate sets) are present for each species, but I was referring to my perception that out of the 11 landmarks present, #1-5 correspond with the mandible subset while #6-11 correspond with the cranium subset, right? 

Otherwise, I am not understanding the example given on p.19:

"data(plethspecies)
Y.gpa<-gpagen(plethspecies$land) #GPA-alignment
land.gp<-c("A","A","A","A","A","B","B","B","B","B","B") #mandible and cranium subsets
EMR<-compare.multi.evol.rates(A=Y.gpa$coords,gp=land.gp, Subset=TRUE, phy= plethspecies$phy,iter=999)"

And if I'm not getting it, then that might be the source of my original confusion.

If I am indeed on the right track, then your suggestion to explore the subset functions may be what I'm looking for and I'm eager to try them out.


Adams, Dean [EEOBS]

unread,
Jul 26, 2017, 2:36:16 PM7/26/17
to Derek Woller, Mike Collyer, geomorph R package

Derek,

 

The function compare.multi.evol.rates() is used to compare the rate of one multi-dimensional trait against another multi-dimensional trait (or more of them).  Thus, one either has multiple traits to compare “the rate of evolution in arm variation versus leg variation” or one compares the rates of evolution in subsets of landmarks from within the same shape or landmark configuration.

 

The example was for a case of the latter. Here, the entire head was a shape, and the groups of landmarks designated two components: the cranium and the mandible, for which rates were quantified and compared.  Mike’s comment on cords.subset() function was for how to deal with means for such traits if these are desired: which they would be in your case since you have multiple individuals per species and thus require the mean for each species for the subsequent rate analysis.

 

Alternatively, if one had two different shape components (e.g., heads and arms), these would be aligned separately, the means for each species found separately, and then the rate comparison would be performed.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

To post to this group, send email to geomorph-...@googlegroups.com.

 

 

 

 

--

You received this message because you are subscribed to the Google Groups "geomorph R package" group.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.

Mike Collyer

unread,
Jul 26, 2017, 2:37:16 PM7/26/17
to Derek Woller, geomorph R package
Okay, but in your case you have six separate landmark configurations digitized - separately - on 10 specimens each for 31 species, correct?  For the salamander heads, one needs to merely assign a landmark to a subset, as they are all in the same configuration.  As Dean hinted, if he wanted to add salamander tails and took separate photos of tails, he would have separate data sets but the function assumes there is one.  So you have to trick the function.  You can do this with combine.subsets.  You just then have to make a grouping variable that matches the numbers and sequences according to the way you combined them.

Mike

Mike Collyer

unread,
Jul 26, 2017, 2:49:01 PM7/26/17
to Derek Woller, geomorph R package
I should add that this is one way to do it.  The other is to make the 3D arrays into 2D matrices and concatenate the different matrices, and use Subset = FALSE as an argument.  You can do this with two.d.array() followed by the cbind function (binding matrices by columns).  The way I previously recommended was to turn multiple shapes into one large configuration (with each potentially scaled to relative size); this other way keeps the shapes separate but does not allow scaling to relative size.  I’m not sure that the relative size matters for estimating evolutionary rates. 

Regardless, you have to concatenate data and getting back to your original premise, doing it outside of R is unneeded and perhaps error inducing.

Mike

Derek Woller

unread,
Jul 26, 2017, 2:51:32 PM7/26/17
to Mike Collyer, geomorph R package
Dean, Mike, the light is slowly dawning. In my mind, I was equating the salamander head subsets with my different shapes as being subsets and I see now that that is technically incorrect. While some of mine could be classified as subsets of morphological traits because they are physically, I have treated all as independent shapes because landmarks were acquired from photographs unique to each shape.

I'll keep fiddling and come back and beg for more guidance if I hit the wall again.

As always, your guidance is invaluable and it will be acknowledged.

Derek Woller

unread,
Jul 26, 2017, 2:54:14 PM7/26/17
to Mike Collyer, geomorph R package
I'll fiddle with both approaches. 

I agree that going outside R is not ideal, I just thought it might save time as this point given that all the data needed is already in Excel form, but it will not take too long to run all the GPA again and that sounds like easier way. I still think, though, that there must be a clever way to turn the Excel input into a 3D array since it's not in a complex format whatsoever, but I'll try the all-R way first.

Derek Woller

unread,
Jul 28, 2017, 9:41:22 PM7/28/17
to Mike Collyer, geomorph R package
Mike, I cracked it! Your assistance was particularly instrumental - many, many thanks!. I've only done it with 2 shapes so far, but combining the other 4 with them should be a snap now. Basically, I ran a GPA on each shape again, calculated the means using the aggregate function as I did previously, converted each of those results to 2D arrays, combined both of these arrays, converted the combo array back into 3D, and then called it in to the multivariate evolutionary rate function. Not too bad once the pathway has been deciphered.

Initially, though, I received the error "Error in C[rownames(x), rownames(x)] : subscript out of bounds" and found a previous thread that you and Emma assisted with: https://github.com/geomorphR/geomorph/issues/10

I had a hunch my names were not matching either (in my tree and combined data array), but I was unable to get the match function to do much for me and I'm wondering I am just using it wrong. I used it with my data, but to bring it back to an earlier example, I tried it with the plethspecies dataset and got the same result as with mine. I set it up like: 

match(plethspecies$land, plethspecies$phy, nomatch = NA_integer_, incomparables = NULL)

and it returns a series of "NA". I may not need this function again for this current task, but my curiosity is piqued.

Derek A. Woller, Ph.D. Candidate

Song Laboratory of Insect Systematics and Evolution, Texas A&M University

http://schistocerca.org/SongLab/

Entomologist, Rangeland Grasshopper and Mormon Cricket Management Team,
USDA: APHIS, PPQ, CPHST - Phoenix Lab
3645 East Wier Ave., Phoenix, AZ, 85040
Office: 602-431-3246
Cell: 480-490-6454
Fax: 602-431-3258

Reply all
Reply to author
Forward
0 new messages