Remove all rows which contain NA values (in a H2OFrame)

1,701 views
Skip to first unread message

hata...@gmail.com

unread,
Jul 22, 2015, 3:46:24 PM7/22/15
to H2O Open Source Scalable Machine Learning - h2ostream
Is it possible to remove all rows for which in any column a value is NA in a H2OFrame? For example if I had:

a b c d e
1 0 NA NA NA NA
2 0 2 2 2 2
3 0 NA NA NA NA
4 0 NA NA 1 2
5 0 NA NA NA NA
6 0 1 2 3 2

I want to remove rows 1, 3, 4, 5 to get:

a b c d e
2 0 2 2 2 2
6 0 1 2 3 2

Spencer Aiello

unread,
Jul 22, 2015, 5:16:33 PM7/22/15
to hata...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
Typically, you'd do this:

        na.omit(myFrame)

But this method is currently missing from H2O -- I've just added it and it will show up in master soon (and the next nightly build).

So very soon you should be able to use the na.omit on an H2OFrame.


Many operations take an additional "na.rm" parameter (e.g. mean, min, max, sum, etc.) that can be passed to omit NAs from computations.

hata...@gmail.com

unread,
Jul 22, 2015, 5:18:07 PM7/22/15
to H2O Open Source Scalable Machine Learning - h2ostream
You hero! I'll get the nightly tomorrow and will give it a try - thank you!

hata...@gmail.com

unread,
Jul 22, 2015, 8:37:14 PM7/22/15
to H2O Open Source Scalable Machine Learning - h2ostream
A follow up question if I may. I just figured out that I need to replace NA values for a integer 0 in two specific columns first before dropping rows. I have tried the way I would do it for a regular data.frame:

h2o_frame$newcol[is.na(h2o_frame$newcol)] <- 0

That, however, gives me:

Error in `[<-`(`*tmp*`, is.na(transactions$country_code), value = 0) :
`i` must be missing or a numeric vector

I found out that there is a function called h2o.sub(pattern, replacement, x). However, calling it gives me:

h2o.sub(NA, 0, transactions$country_code)

ERROR: Unexpected HTTP Status code: 412 Precondition Failed

Any idea how to do this?

Spencer Aiello

unread,
Jul 23, 2015, 3:21:19 AM7/23/15
to Sebastian Hätälä, H2O Open Source Scalable Machine Learning - h2ostream
try this:

   h2o_frame[is.na(h2o_frame$newcol), "newcol"] <- 0



hata...@gmail.com

unread,
Jul 23, 2015, 9:38:52 AM7/23/15
to H2O Open Source Scalable Machine Learning - h2ostream, spe...@h2o.ai
On Thursday, 23 July 2015 08:21:19 UTC+1, Spencer Aiello wrote:
> try this:
>
>
>    h2o_frame[is.na(h2o_frame$newcol), "newcol"] <- 0

Gives me the same error, unfortunately.

> h2o_frame[is.na(h2o_frame$newcol), "newcol"] <- 0
Error in `[<-`(`*tmp*`, is.na(h2o_frame$newcol), "newcol", :
`i` must be missing or a numeric vector
> h2o_frame[is.na(h2o_frame$newcol), 'newcol'] <- 0
Error in `[<-`(`*tmp*`, is.na(h2o_frame$newcol), "newcol", :
`i` must be missing or a numeric vector
> h2o_frame[is.na(h2o_frame$newcol), h2o_frame$newcol] <- 0
Error in !missingJ && is.na(j) : invalid 'y' type in 'x && y'
> h2o_frame$newcol[is.na(h2o_frame$newcol), h2o_frame$newcol] <- 0
Error in !missingJ && is.na(j) : invalid 'y' type in 'x && y'
> h2o_frame$newcol[is.na(h2o_frame$newcol), "newcol"] <- 0
Error in `[<-`(`*tmp*`, is.na(h2o_frame$newcol), "newcol", :
`i` must be missing or a numeric vector
> h2o_frame[is.na(h2o_frame$newcol), "newcol"] <- 0
Error in `[<-`(`*tmp*`, is.na(h2o_frame$newcol), "newcol", :
`i` must be missing or a numeric vector
> h2o_frame[is.na(h2o_frame$newcol), 4] <- 0
Error in `[<-`(`*tmp*`, is.na(h2o_frame$newcol), 4, value = 0) :

Spencer Aiello

unread,
Jul 23, 2015, 1:12:00 PM7/23/15
to Sebastian Hätälä, H2O Open Source Scalable Machine Learning - h2ostream
see if this example works for you:

     df <- data.frame(a=c(NA,2,2,2))
     dfh <- as.h2o(df)
     dfh$newcol <- dfh$a
     dfh[is.na(dfh$newcol), "newcol"] <- 0




hata...@gmail.com

unread,
Jul 23, 2015, 1:28:57 PM7/23/15
to H2O Open Source Scalable Machine Learning - h2ostream, spe...@h2o.ai
> df <- data.frame(a=c(NA,2,2,2))
> dfh <- as.h2o(df)
100%

> dfh$newcol <- dfh$a
> dfh[is.na(dfh$newcol), "newcol"] <- 0
Error in `[<-`(`*tmp*`, is.na(dfh$newcol), "newcol", value = 0) :
`i` must be missing or a numeric vector

I also reinstalled H2O just to verify that it is not my installation, still no luck :-( Does the code above work for you?

Spencer Aiello

unread,
Jul 23, 2015, 1:31:44 PM7/23/15
to hata...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
yes this works -- if you're using RStudio, try punting the session and reinstalling the h2o package. It might also be interesting to try in a vanilla R console.

Do you load any other packages?

hata...@gmail.com

unread,
Jul 23, 2015, 6:50:40 PM7/23/15
to H2O Open Source Scalable Machine Learning - h2ostream, spe...@h2o.ai
I tried it on a different machine with the vanilla R console and it worked...Thank's very much must be my R installation is broken :-)

d...@wellter.com

unread,
Apr 27, 2016, 12:10:14 AM4/27/16
to H2O Open Source Scalable Machine Learning - h2ostream, spe...@h2o.ai, hata...@gmail.com
On Thursday, July 23, 2015 at 5:50:40 PM UTC-5, hata...@gmail.com wrote:
> I tried it on a different machine with the vanilla R console and it worked...Thank's very much must be my R installation is broken :-)


I tried na_omit() from Python today and what I believe is the latest code level (3.8.2.3) on both the client and server and it failed with this message: Name lookup of 'na.omit' failed. Most everything else seems to work OK. I'm fairly new to H2O so maybe I'm missing something - but any suggestions on what might be wrong?

Lauren DiPerna

unread,
Apr 27, 2016, 7:13:39 PM4/27/16
to d...@wellter.com, H2O Open Source Scalable Machine Learning - h2ostream, spe...@h2o.ai, hata...@gmail.com
thanks for pointing out this bug! we've added a JIRA you can follow the progress here: https://0xdata.atlassian.net/browse/PUBDEV-2880


--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages