mydata$GENDER = relevel(mydata$GENDER, ref='M')
mydata.h2o = as.h2o(mydata)
The first line of code successfully changes the reference to "M" in the "mydata" R data frame. Yet after running the second line of code to convert to an H2O data frame, that new reference level is lost. Can anyone explain why?
~ Li
Li,
Check out h2o.setLevels(), it will allow you to change the order of your levels (and hence redefine your reference level).
-Erin
-- Erin LeDell Ph.D. Statistician & Machine Learning Scientist | H2O.ai
Hi Erin,
Thanks for the suggestion, and in fact that's exactly what I'm doing as a workaround. But it would be nice if the "as.h2o" function could automatically inherit or at least provide the option of inheriting the reference level from the R data so that I won't need to use the "h2o.setLevels()" function. Perhaps that could be a new feature?
Best,
~ Li
Li,
Ok, I'm glad you know about h2o.setLevels().
H2O's parse functionality for factors will always use
alphabetical order for the ordering of the factors. I agree that
it would make sense to inherit any factor level information from
the data.frame in R or Pandas DataFrame in Python. I will have to
look into how to make that work across all our different APIs and
whether or how changing the expected behavior of our parsing will
affect the rest of the data processing workflow.
Thanks for the suggestion.
-Erin
I'll also add that most of our users don't use the as.h2o() method of converting a data.frame from R, but rather, use the h2o.importFile() for moving the data from disk into the H2O cluster directly (skipping the step where you duplicate the data in R memory).
Unless you require doing data munging directly in R, it is best
to load the data frame into H2O memory and use the rest of the H2O
utilities to munge the data directly in H2O. (Then you wouldn't
have to set the levels twice as well).