Error : Independent variables are missing from either the data frame or global environment

612 views
Skip to first unread message

Noé Barthelemy

unread,
Aug 18, 2021, 5:08:55 AM8/18/21
to geomorph R package
Hi,

I'm new to the group so first I'd like to thank you all for your future help!
I spend recently a couple of days trying to fix this issue:

When trying to create a model based on a geomorph dataframe, I get the error message indicated below:
fit.common_Multipop <- procD.lm(f1 = coords ~ Length + Lake, data = gdf.Multipop, RRPP = TRUE)
> Error : Independent variables are missing from either the data frame or global environment

My gdf used to work in the past, so maybe something has changed in geomorph? (by the way all my packages are up to date).
Also, creating a model works for me only when I use the variables "coords" and 'Csize" as they are created using the gpagen function.

However, the variables "Lake" and "Length" are incorporated to the gdf as below:

gdf.Multipop <- geomorph.data.frame(coords = proc_MultiPop$coords, size = proc_MultiPop$Csize, Lake = Classifier_Multipop$Lake,  Length = Classifier_Multipop$length_cm)

When inspecting the gdf in Rstudio, the variables Lake and Length are in "character" and "double" format, respectively. They are properly ordered (as in the first observation matches the first specimen), but the names of the individuals are not associated to each observation as they are for Csize, for example.

I hope that this is clear, I'll be happy to provide more details if necessary!
Thanks again for your help, it is very much appreciated.

Best regards,
Noé Barthelemy



Mike Collyer

unread,
Aug 18, 2021, 8:09:53 AM8/18/21
to geomorph-...@googlegroups.com
Dear Noé,

Although geomorph has not changed recently, R has (versions 4.1 and 4.1.1, since last geomorph update).  It is possible that CRAN made some changes that caused a previously working function to no longer work.  However, I tried to simulate your conditions both before and after updating R and had no issues using a character vector (rather than a factor) in either case.

The error you have triggered results from an internal function attempting to use the lm function with the variables you have (if the lm function throws an error, then lm.rrpp — the workhorse function for procD.lm — arrests the analysis, since there will be problems down the line).  I did not see any lm updates so I do not think that is it.

You could try to coerce gdf$Lake to be a factor and see if that helps.  The problem is definitely in the Lake or Length variables, as you have ascertained, but it is not clear exactly what could be the issue.  I did try to simulate a typo like using something like Lake = Classifier_Multipop$LAke or Classifier_Multipop$lake instead of Lake = Classifier_Multipop$Lake.  This produced the same error you received.  If you are checking the type/class/structure of Classifier_Multipop$Lake instead of gdf$Lake, you might not discover that gdf$Lake is NULL rather than “character” type.  This is a shot in the dark, based on limited information.  Regardless, it appears your geomorph data frame is different from what is intended, pertaining to the Lake or Length variables.

Hope something in this helps!
Mike


--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/9f7be909-5410-4676-8e9c-adad5de1821bn%40googlegroups.com.

Noé Barthelemy

unread,
Aug 18, 2021, 10:39:19 AM8/18/21
to geomorph R package

Hi Mike, thanks a lot for your quick answer!

Unfortunately, changing the class to factor did not change anything... Let's look elswhere!
A line like this works: fit.common_Multipop <- procD.lm(f1 = gdf.Multipop$shape ~ gdf.Multipop$size, data = gdf.Multipop, RRPP = TRUE)
As you can see in the first screenshot, the variable "size" has the little blue arrow + names (names of the specimens), while "Length" and "Lake" do not, as you can see in the second screenshot.

This is the main difference of format I can see. If I change "Size" to a factor the model above still works.
If you have an idea of how to code a new format for my variables I'll be very happy to hear it.

Best,
Noé
Second_screenshot.png
First_screenshot.png

Mike Collyer

unread,
Aug 18, 2021, 10:59:56 AM8/18/21
to geomorph R package
Noé,

Size, as a continuous variable, should not be coerced into factor.

I cannot tell why your analysis does not work but I do see you are using R 3.6.3, which has not been current in 1.5 years.  I recommend updating R, R Studio, geomorph, and all dependencies before investigating further.  If you have an issue, make sure it is a current issue, and not one by trying to make different outdated versions of software work together.

Cheers!
Mike

To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/f07030c6-3d93-4140-9060-f534eb0ec806n%40googlegroups.com.
<Second_screenshot.png><First_screenshot.png>

Noé Barthelemy

unread,
Aug 30, 2021, 12:30:52 PM8/30/21
to geomorph R package
Hi Mike,

Thanks again for your answer, I very much appreciate the value of your time, and I should have thought about updating everything before asking, sorry for that.
However, after updating all my packages (wich took a while!) as well as R (now in 4.1.1), the issue stays the same.
I must precise that it doesn't seem to be a package conflict issue as the issue still appears when only geomorph is loaded.

I still suspect that it stems from the format the variables are in, but I have no clue about how to test it.
I'm trying a few things now and will keep you updated.

Cheers!
Noé

Noé Barthelemy

unread,
Aug 30, 2021, 1:05:20 PM8/30/21
to geomorph R package
By the way, I found this copy of a StacksOverflow thread (which doesn't seem to exist on Stacks anyomre?!)
This user seemed to have a very similar issue, if that can help to help me !

Cheers!
Noé

Mike Collyer

unread,
Aug 30, 2021, 1:33:01 PM8/30/21
to geomorph R package
Noé,

The error produced is because in an attempt to create a linear model with the variables provided, some variables are not found.  But this does not mean you did not attempt to provide them.  It could be because in the function, somehow in formatting, arranging, or manipulating the variables, something went wrong but created a “silent” error (did not stop the analysis) so the error that is eventually revealed is not associated with the real problem.

If you could send me your data and script causing the problem outside of the geomorph R package google group, I can try to diagnose the problem, but I am afraid I cannot do it based on the information you have already provided.  There appears to be an issue but it is not obvious.

Best,
Mike

Mike Collyer

unread,
Sep 1, 2021, 10:05:27 AM9/1/21
to geomorph R package
Dear Noé and everyone,

I was able to diagnose the problem (bug) in Noé’s analyses and have made a couple of important updates (which can be obtained currently on github — see below).  The issue was not associated with geomorph, but was associated with the support code for the lm.rrpp function in the RRPP package, on which procD.lm relies.

The problem results from a change in programming a while back to rely on core R functions rather than re-invent wheels, where possible.  One context is missing data.  When using the lm function with a data.frame, there is the argument, “na.action”, one can use to decide how to handle missing values.  The default is to delete them.  So, rather than having excessive code to deal with missing values, it is easier to ask R to try to create a linear model with the input variables.  If it works, great!  If not, stop and alert the user.  This is how Noé got the error.  The reason for the error, however, was an unanticipated way of R dealing with the missing values (choosing to retain them).

After some consideration, I realized that RRPP does not make sense if there are not residuals to randomize, so I updated code to force removing of missing values.  This was not a trivial bit of coding for a trivial task, but I think it works now (at least it did with Noé’s data, when I tested it).

However, I wish to add one bit of advice.  There are several functions in R that have ways to handle missing data.  The best way to handle missing data though is to actively handle it before analysis.  When working with data in R, I try to not ask R to fix things within a function that could be addressed outside of a function.  To do so means to trust it was fixed correctly, and I might prefer to verify the correction rather than trust functions that could have programming bugs.

To make this easier, I will soon try to have an na.omit.geomorph.data.frame function to use in geomorph; i.e., 

updated.gdf <- na.omit(gdf) # gdf is a geomorph.data.frame

However, this is a tricky function to create.  Whereas na.omit.data.frame simply has to omit any row with an NA value in a data.frame object (which looks like a matrix), geomorph data frames might have phylo trees, covariance matrices, and certainly data in an array rather than a matrix.  Therefore, the updates for lm.rrpp and procD.lm are ready to go, but additional functions might take some time to develop.  

As a reminder, to install from github:

devtools::install_github(“mlcollyer/RRPP”, build_vignettes = TRUE)
devtools::install_github(“geomorphR/RRPP”, ref = “Stable”, build_vignettes = TRUE)

cheers!
Mike

Mike Collyer

unread,
Sep 1, 2021, 11:51:48 AM9/1/21
to geomorph R package
It appears that I over-estimated the difficulty of making an na.omit type function for geomorph data frames.  There is now such a function in the geomorph package (with help file for its use), on Github.

Please be aware that while performing these updates, I discovered some issues with rendering vignettes.  This usually means something changed in R (like with the 4.0 update), and now some of our functions do not work as they did before.  It also means that it is time to perform scans of all functions to find bugs.

Please be patient, as this might take a few days.

Cheers!
Mike

Mike Collyer

unread,
Sep 1, 2021, 12:42:03 PM9/1/21
to geomorph R package
Again, another over-estimation!  The failed vignette rendering was caused by a problem I just inadvertently created in updating geomorph data frames (actually because of the process of converting them to a useable form within the lm.rrpp support code), which left the model matrix null (cascading as a problem to other functions).  I believe everything is operational now (on Github), with multiple ways to handle missing data.  If anyone finds other issues, please let me know.  These simplex fixes often open Pandora’s box a crack.

We will look to update geomorph on CRAN soon, but that process is always more complicated than updates on Github.

Mike

Noé Barthelemy

unread,
Sep 7, 2021, 6:49:40 AM9/7/21
to geomorph R package
Hi Mike,

I was away for a few days, and very happy to see this resolved on my return!!!
I did not believe that my error would lead to so much work on your end, so thank you for that. I also did learn a lot from your detailed explanations!
I tried the fix, and it works perfectly so far!

However, when trying to install RRPP from the geomorphR repo I get:
Error : Failed to install 'RRPP' from GitHub: HTTP error 404. Not Found Did you spell the repo owner (`geomorphR`) and repo name (`RRPP`) correctly? - If spelling is correct, check that you have the required permissions to access the repo.

(Using: devtools::install_github("geomorphR/RRPP", ref = "Stable", build_vignettes = TRUE) )
Maybe this is something on my end, and it doesn't matter that much to me since I can go forward in my analyses, but I thought I should report that to you anyway.

Many many thanks again Mike!

Cheers,
Noé

Mike Collyer

unread,
Sep 7, 2021, 7:28:56 AM9/7/21
to geomorph-...@googlegroups.com
Hi Noé

“mlcollyer/RRPP” is the source for RRPP, not “geomorphR/RRPP”. 

Cheers!
Mike

Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages