excluding groups based on their frequencies in the data frame

9 views
Skip to first unread message

Renanel Pickholtz

unread,
Mar 11, 2015, 5:28:51 AM3/11/15
to israel-r-...@googlegroups.com
I KNOW this should be an easy one, but I've been chasing my own tail for way too long, so here it is:

I have a data frame (nrow>10K) with IDs and coordinates ("NAME", "LAT", "LON").  I'm runing several functions that require an ID to appear at least 10 time in order to compute their value (density contours, to be specific).
How do I exclude all IDs based on their frequency in the database (=<9, in my example). Something along the line of subset all rows with "NAME" with total count in the datafarme => 10.
I've tried using data.table and dplyr, but all the solutions I found required manually constructing a string of characters based on their ID, and then removing them. However,  I have hundreds of unique IDs- hence my question.

Thanks,
Renanel

Liad Shekel

unread,
Mar 11, 2015, 5:48:59 AM3/11/15
to Israel R User Group

library(dplyr)

filtered <- DATA %>% 
group_by(ID) %>% 
summarise( n=n() ) %>%
filter( n>=10 ) %>% 
left_join(DATA, by = "ID") %>% 
select(-n)



--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
:-)

Renanel Pickholtz

unread,
Mar 11, 2015, 6:08:49 AM3/11/15
to israel-r-...@googlegroups.com
Hello,
Following your example, I tried the following:

DATA = DF
ID = as.factor(DF$NAME)

Following your example, I ran the following

DATA = DF
ID = DF$NAME


filtered <- DATA %>%
  group_by(ID) %>%
  summarise( n=n() ) %>%
  filter( n>=10 ) %>%
  left_join(DATA, by = "ID") %>%
  select(-n)

but I keep on getting Error: unknown column 'ID'

What am I missing?


Liad Shekel

unread,
Mar 11, 2015, 6:12:18 AM3/11/15
to Israel R User Group
ID should be a column in the data frame.
You can use your original variable (DF?) with the variable NAME instead of ID, try this:

filtered <- DF %>% 
  group_by(NAME) %>% 

  summarise( n=n() ) %>%
  filter( n>=10 ) %>% 
  left_join(DATA, by = "NAME") %>% 
  select(-n)




--
You received this message because you are subscribed to the Google Groups "Israel R User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to israel-r-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
:-)

Liad Shekel

unread,
Mar 11, 2015, 6:27:57 AM3/11/15
to Israel R User Group

I forgot to change the second DATA to DF.
Try now:

filtered <- DF %>% 
  group_by(NAME) %>% 
  summarise( n=n() ) %>%
  filter( n>=10 ) %>% 

  left_join(DF, by = "NAME") %>% 
  select(-n)

בתאריך 11 במרץ 2015 12:11, ‏"Liad Shekel" <liad....@gmail.com> כתב:
Reply all
Reply to author
Forward
0 new messages