[R] Count number of consecutive zeros by group

214 views
Skip to first unread message

Carlos Nasher

unread,
Oct 31, 2013, 7:20:44 AM10/31/13
to r-h...@r-project.org
Dear R-helpers,

I need to count the maximum number of consecutive zero values of a variable
in a dataframe by different groups. My dataframe looks like this:

ID <- c(1,1,1,2,2,3,3,3,3)
x <- c(1,0,0,0,0,1,1,0,1)
df <- data.frame(ID=ID,x=x)
rm(ID,x)

So I want to get the max number of consecutive zeros of variable x for each
ID. I found rle() to be helpful for this task; so I did:

FUN <- function(x) {
rles <- rle(x == 0)
}
consec <- lapply(split(df[,2],df[,1]), FUN)

consec is now an rle object containing lists für each ID that contain
$lenghts: int as the counts for every consecutive number and $values: logi
indicating if the consecutive numbers are zero or not.

Unfortunately I'm not very experienced with lists. Could you help me how to
extract the max number of consec zeros for each ID and return the result as
a dataframe containing ID and max number of consecutive zeros?

Different approaches are also welcome. Since the real dataframe is quite
large, a fast solution is appreciated.

Best regards,
Carlos


--
-----------------------------------------------------------------
Carlos Nasher
Buchenstr. 12
22299 Hamburg

tel: +49 (0)40 67952962
mobil: +49 (0)175 9386725
mail: carlos...@gmail.com

[[alternative HTML version deleted]]

S Ellison

unread,
Oct 31, 2013, 7:34:32 AM10/31/13
to Carlos Nasher, r-h...@r-project.org


> -----Original Message-----
> So I want to get the max number of consecutive zeros of variable x for each
> ID. I found rle() to be helpful for this task; so I did:
>
> FUN <- function(x) {
> rles <- rle(x == 0)
> }
> consec <- lapply(split(df[,2],df[,1]), FUN)

You're probably better off with tapply and a function that returns what you want. You're probably also better off with a data frame name that isn't a function name, so I'll use dfr instead of df...

dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered 1-5, equal size but that doesn't matter for tapply

f2 <- function(x) {
max( rle(x == 0)$lengths )
}
with(dfr, tapply(x, ID, f2))


S Ellison


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Carlos Nasher

unread,
Oct 31, 2013, 1:45:54 PM10/31/13
to S Ellison, r-h...@r-project.org
If I apply your function to my test data:

ID <- c(1,1,1,2,2,3,3,3,3)
x <- c(1,0,0,0,0,1,1,0,1)
data <- data.frame(ID=ID,x=x)
rm(ID,x)

f2 <- function(x) {
max( rle(x == 0)$lengths )
}
with(data, tapply(x, ID, f2))

the result is
1 2 3
2 2 2

which is not what I'm aiming for. It should be
1 2 3
2 2 1

I think f2 does not return the max of consecutive zeros, but the max of any
consecutve number... Any idea how to fix this?


2013/10/31 S Ellison <S.El...@lgcgroup.com>

>
>
> > -----Original Message-----
> > So I want to get the max number of consecutive zeros of variable x for
> each
> > ID. I found rle() to be helpful for this task; so I did:
> >
> > FUN <- function(x) {
> > rles <- rle(x == 0)
> > }
> > consec <- lapply(split(df[,2],df[,1]), FUN)
>
> You're probably better off with tapply and a function that returns what
> you want. You're probably also better off with a data frame name that isn't
> a function name, so I'll use dfr instead of df...
>
> dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered
> 1-5, equal size but that doesn't matter for tapply
>
> f2 <- function(x) {
> max( rle(x == 0)$lengths )
> }
> with(dfr, tapply(x, ID, f2))
>
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:24}}

arun

unread,
Oct 31, 2013, 9:20:25 AM10/31/13
to R help, Carlos Nasher
Hi,
May be this helps:
fun1 <- function(dat){
 lst1 <- lapply(split(dat,dat$ID),function(y){
 rl <- rle(y$x)
 data.frame(ID=unique(y$ID),MAXZero=max(rl$lengths[rl$values==0]))
 })
 do.call(rbind,lst1)
 }

fun1(df)
#  ID MAXZero
#1  1       2
#2  2       2
#3  3       1

A.K.

S Ellison

unread,
Oct 31, 2013, 2:26:56 PM10/31/13
to Carlos Nasher, r-h...@r-project.org
> If I apply your function to my test data:
>
....
> the result is
> 1 2 3
> 2 2 2
>
...
> I think f2 does not return the max of consecutive zeros, but the max of any
> consecutve number... Any idea how to fix this?

The toy example of tapply using f2 does indeed return the maximum run lengths irrespective of the value repeated.
If you want to select runs of a particular value, you can select according to use $values element of the rle object, again inside the function.
Modifying to accommodate that (and again avoiding a data frame name the same as a base R function name - you managed it again!):

dfr <- data.frame(ID = c(1,1,1,2,2,3,3,3,3), x = c(1,0,0,0,0,1,1,0,1))

f3 <- function(x) {
runs <- rle(x == 0L) #Often wise to be careful with == and numbers ... see FAQ 7.31
with(runs, max(lengths[values]))
#This works because in this case the values in
#$values are TRUE for x==0 and FALSE otherwise; see ?'[' for why those work
}
with(dfr, tapply(x, ID, f3))

or, more or less equivalently but a shade more generally

f4 <- function(x, select=0L) {
runs <- rle(x )
with(runs, max(lengths[values == select]))
}
with(dfr, tapply(x, ID, f4))

None of this checks that runs of zero exist in a group; if they don't, you'll get warnings and -Inf in the output as max takes maxima of nothing. You can add extra checks inside the function if that bothers you.




*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Hervé Pagès

unread,
Oct 31, 2013, 2:54:55 PM10/31/13
to Carlos Nasher, r-h...@r-project.org
Hi Carlos,

With Bioconductor, this can simply be done with:

library(IRanges)
ID <- Rle(1:3, c(3,2,4))
x <- Rle(c(1,0,0,0,0,1,1,0,1))
groups <- split(x, ID)
idx <- groups == 0

Then:

> max(runLength(idx)[runValue(idx)])
1 2 3
2 2 1

Should be fast even with hundreds of thousands of groups (should take
< 10 sec).

HTH,
H.


On 10/31/2013 04:20 AM, Carlos Nasher wrote:
> Dear R-helpers,
>
> I need to count the maximum number of consecutive zero values of a variable
> in a dataframe by different groups. My dataframe looks like this:
>
> ID <- c(1,1,1,2,2,3,3,3,3)
> x <- c(1,0,0,0,0,1,1,0,1)
> df <- data.frame(ID=ID,x=x)
> rm(ID,x)
>
> So I want to get the max number of consecutive zeros of variable x for each
> ID. I found rle() to be helpful for this task; so I did:
>
> FUN <- function(x) {
> rles <- rle(x == 0)
> }
> consec <- lapply(split(df[,2],df[,1]), FUN)
>
> consec is now an rle object containing lists für each ID that contain
> $lenghts: int as the counts for every consecutive number and $values: logi
> indicating if the consecutive numbers are zero or not.
>
> Unfortunately I'm not very experienced with lists. Could you help me how to
> extract the max number of consec zeros for each ID and return the result as
> a dataframe containing ID and max number of consecutive zeros?
>
> Different approaches are also welcome. Since the real dataframe is quite
> large, a fast solution is appreciated.
>
> Best regards,
> Carlos
>
>
>
>
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319

William Dunlap

unread,
Oct 31, 2013, 3:07:25 PM10/31/13
to S Ellison, Carlos Nasher, r-h...@r-project.org
> None of this checks that runs of zero exist in a group; if they don't, you'll get warnings
> and -Inf in the output as max takes maxima of nothing. You can add extra checks inside
> the function if that bothers you.

Just adding a second argument, 0, to the call to max() will take care of that.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

PIKAL Petr

unread,
Nov 1, 2013, 6:01:52 AM11/1/13
to Carlos Nasher, r-h...@r-project.org
Hi

Another option is sapply/split/sum construction

with(data, sapply(split(x, ID), function(x) sum(x==0)))

Regards
Petr


> -----Original Message-----
> From: r-help-...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Carlos Nasher
> Sent: Thursday, October 31, 2013 6:46 PM
> To: S Ellison
> Cc: r-h...@r-project.org
> Subject: Re: [R] Count number of consecutive zeros by group
>

arun

unread,
Nov 1, 2013, 9:17:19 AM11/1/13
to R help, Carlos Nasher
I think this gives a different result than the one OP asked for:

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0,
0, 1, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c("ID",
"x"), row.names = c(NA, -22L), class = "data.frame")

with(df1, sapply(split(x, ID), function(x) sum(x==0)))

with(df1,tapply(x,list(ID),function(y) {rl <- rle(!y); max(c(0,rl$lengths[rl$values]))}))


A.K.

PIKAL Petr

unread,
Nov 1, 2013, 12:24:36 PM11/1/13
to arun, R help, Carlos Nasher
Hi

Yes you are right. This gives number of zeroes not max number of consecutive zeroes.
Reply all
Reply to author
Forward
0 new messages