merge(by=...) for geomorph_data_frame

140 views
Skip to first unread message

Bob LePaul

unread,
Apr 24, 2021, 12:30:02 PM4/24/21
to geomorph R package

Dear colleagues,

for downstream analyses I would like to merge data from a geomorph_data_frame and a normal data_frame. However, the function merge() is not applicable for this kind of data frame, so I am looking for a suitable alternative. The geomorph documentation does not mention merging processes with a "by" variable.

Specifically, I have a dataset that I run gpagen() on and merge the coordinates into a geomorph_data_frame that also contains an "id" variable. Each "id" refers to an unique animal.

Now I have another dataset that contains the same "ids", but it is a dataset where every "id" is present more than once (individuals with the same id were used multiple times). I would like to have the same coordinates and centroid sizes assigned to every instance where the same ID is present. Usually, I would use merge(Data1, Data2, by="id"), but this method does not work for geomorph_data_frame formats. Hence, I would be grateful if you could point me to a suitable alternative.

Thank you for your time!
Bob

Adams, Dean [EEOB]

unread,
Apr 24, 2021, 12:31:28 PM4/24/21
to geomorph-...@googlegroups.com

If you first convert your 3D array of coordinates into a 2D array, this should be possible.


Dean

 

Dr. Dean C. Adams

Director of Graduate Education, EEB Program

Professor

Department of Ecology, Evolution, and Organismal Biology

Iowa State University

https://www.eeob.iastate.edu/faculty/adams/

phone: 515-294-3834

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/070767c8-ba91-4478-97af-893dbf19079bn%40googlegroups.com.

Bob LePaul

unread,
Apr 24, 2021, 12:45:15 PM4/24/21
to geomorph R package
Dear Dean,

that was what I thought and did, but it did not help. Maybe I have been overlooking something?

par_id <- read.table("ids.txt", header=TRUE, stringsAsFactors=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
id_fam <- read.table("id_fam.txt", header=TRUE, stringsAsFactors=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
par_data <- readland.tps("data.tps",  specID = c("ID"),  negNA = FALSE,  warnmsg = TRUE)

par.shape <- gpagen(par_data)$coords
par.csize <- gpagen(par_data)$Csize
par.dat <- two.d.array(par.shape)
par.dat.full <- geomorph.data.frame(coords =par.dat, csize = par.csize, id=par_id$ID)
par.dat.full2 <- merge(par.dat.full, id_fam, by="ID", all.x=TRUE)

ERROR: cannot coerce class '"geomorph.data.frame"' to a data.frame

Adams, Dean [EEOB]

unread,
Apr 24, 2021, 12:46:54 PM4/24/21
to geomorph-...@googlegroups.com

Taking a 2D matrix and then putting back into a geomorph data frame is unnecessary.

Mike Collyer

unread,
Apr 24, 2021, 1:28:41 PM4/24/21
to geomorph R package
There is no merge option for geomorph.data.frame.  You should not expect that geomorph.data.frame objects and data.frame objects are commensurate objects.  

However, merge is a pretty unnecessary function, anyway.  If you want to append objects to a geomorph.data.frame object, like a data.frame object, it is best to remember these are different types of organized lists.  Therefore, you can just treat them like lists.  So, if you have A containing $habitat and B containing $coords and $Csize, you can add $habitat to B, as

B$habitat <- A$habitat

Hope that helps.

Mike

Bob LePaul

unread,
Apr 24, 2021, 1:44:52 PM4/24/21
to geomorph R package
Dear Mike,

the issue is the overlooked "by" argument of merge(). Lets say I have coords for ID1 in Dataset 1 and ID1 appears multiple times in Dataset2. The "by" argument of merge() allows me to match the IDs and insert the coords and csizes multiple times, wherever the appropriate ID is present. Your method unfortunately only allows appending lists with predefined order to each other.

Cheers
Bob

Bob LePaul

unread,
Apr 24, 2021, 1:57:18 PM4/24/21
to geomorph R package
Without putting back the 2D matrix into a geomorph data frame, I am unsure how I would append another dataset containing the correct IDs for each coordinate in the correct order (the IDs derived from the tps file are unsuitable for the matching that I am looking for here).

Bryan H. Juarez

unread,
Apr 24, 2021, 2:34:54 PM4/24/21
to geomorph-...@googlegroups.com
Hi all,

Perhaps the easiest solution is to do all the necessary data manipulation before using geomorph.data.frame(). One can use methods of their choosing to organize data after obtaining the two.d.array(), and then use geomorph.data.frame() as the last step. 

geomorph.data.frame() serves to format data in a consistent structure before fitting a model within geomorph/RRPP, and trying to manipulate it further may result in unintended changes to the structure, limiting its ability to perform its original task. It seems to me that a geomorph.data.frame() object should not generally be manipulated if we want it to interact with e.g. procD.pgls() in the intended manner.

Best,
Bryan 



--
Bryan H. Juarez (he/him/his)
NSF Fellow
PhD Candidate
Adams Lab
EEOB Dept.
Iowa State University
Twitter: @bhjuarez


Mike Collyer

unread,
Apr 24, 2021, 2:37:44 PM4/24/21
to geomorph-...@googlegroups.com
Bob,

You just need to be a bit more resourceful.  

B$habitat <- A$habitat[match(B$ID1, A$ID1)]

You should get comfortable with functions like match, order, sort, and %in%.  These give you far more control over your data.  The merge function is a convenience function that does stuff like the above for you.  However, it only works with objects that could be coerced into data.frame objects.  Comparatively, vectors, matrices, arrays, and lists can all be sorted, ordered, partitioned, and appended, if one is comfortable base syntax.

Mike

Reply all
Reply to author
Forward
0 new messages