batch processing using apply and/or dplyr and seq()

655 views
Skip to first unread message

Corey Clatterbuck

unread,
Jul 9, 2018, 7:51:31 PM7/9/18
to Davis R Users' Group
Hi all,

I have a data frame with individual id "Band" and dates/times "DateTime" in POSIXct format. I want to create a regularized time interval of 150 seconds between DateTimes, starting with the first DateTime within each individual and ending with the final time. (If you think I'm heading towards interpolation, you'd be correct!) However, batch processing this is getting to be a big PITA.


If I subset the df by individual ID "bird553", using seq() works:

ti <- seq(bird553$DateTime[1], bird553$DateTime[length(bird553$DateTime)],
          by = 150)


Here are my attempts to batch process:

ti <- all %>%
  group_by(Band) %>%
  mutate(seq(all$DateTime[1], all$DateTime[length(all$DateTime)],
      by = 150))
Error in mutate_impl(.data, dots) : 
  Column `seq(all$date[1], all$date[length(all$date)], by = 158)` must be length 2538 (the group size) or one, not 205888

ti <- tapply(all$DateTime, all$Band, seq(all$date[1], all$date[length(all$date)], by = 150))
Error in match.fun(FUN) : 'seq(all$date[1], all$date[length(all$date)], by = 150)' is not a function, character or symbol


I've done different variations on the code above, but essentially I don't think I'm calling to the correct elements in the df. I'd like to figure out how to loop this code through subsets of individual IDs of the df. I know generally this might be easier using adehabitatLT, but for the moment I don't want to convert LL to UTMs. Thanks all!

Corey

Brandon Hurr

unread,
Jul 9, 2018, 8:21:16 PM7/9/18
to davi...@googlegroups.com
I'm unsure if this is the only issue, but when using dplyr you shouldn't use all$ before the variable name. 

 all %>%
  group_by(Band) %>%
  mutate(sequence = seq(first(DateTime), last(DateTime), by = 150))

Hard to say for sure though. Would help if you sent at least one bird's full data, if not many so we can make sure it works across groupings.

B



--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Evan Eskew

unread,
Jul 9, 2018, 10:36:33 PM7/9/18
to davi...@googlegroups.com
As Brandon said, an example dataset would definitely help for testing. But upon quick glance, I think there's another issue with how you're trying to use 'group_by()'. If I understand your problem correctly, you want a DateTime sequence for every individual bird ID (each "Band"), but this sequence may be of variable length for each bird. However, dplyr is meant to work naturally with dataframes, and when you use 'group_by()' in this case, dplyr is expecting to return a dataframe with a number of rows equal to the number of individual birds in the dataset (since you grouped by "Band"). When you then use 'mutate()' to create your new variable, dplyr is expecting to return only one value for each bird, yet what you actually want is a sequence of values for each bird. Your variable referencing issues aside (i.e., the $ stuff Brandon brought up), you're attempting to ask dplyr to squeeze what you successfully generated using your 'seq()' implementation into one cell of a dataframe. And it's putting up a fuss. I think you're going to need to either switch to recording your sequences for each bird in a list or generate a dataframe such that you have a column for 1) "Band", 2) the DateTime sequence element number (ranging from 1 to whatever the sequence length should be for each bird), and 3) the DateTime sequence values themselves. Using this second strategy, your dataframe is going to be much longer than whatever your number of birds is and there will be a variable number of entries for each individual "Band", as I believe you want. 

Hope that helps!
Evan

To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.



--

Noam Ross

unread,
Jul 10, 2018, 11:03:26 AM7/10/18
to davi...@googlegroups.com

This is the where we enter the tidyverse’s diabolical world of nested data frames through tidyr:

library(dplyr)
library(tidyr)

#Make fake data, birds where each one has a time sequence of randomly
#spaced times
the_birds <- data_frame(Band = rep(1:10, each = sample.int(20, 1, TRUE))) %>% 
  group_by(Band) %>% 
  mutate(DateTime = Sys.time() + cumsum(sample.int(200, n(), TRUE))) %>% 
  group_by()

# Make a regularly-spaced sequence for each bird
# Since mutate() expects an object the same length of the birds, and 
# summarise() wants objects of length one, we return length-1 sequnces by
# turning them into a list-column, then expanding it with unnest()
regular_birds <- the_birds %>% 
  group_by(Band) %>% 
  summarise(TimeSeq = list(seq(min(DateTime), max(DateTime), by = 150))) %>% 
  unnest(TimeSeq)

# To do this while escaping the tidyprisonwithnowalls, you can split the data
# by bird and lapply across each data frame to generate sequences
base_regular_birds <-
  do.call(rbind,
          lapply(
            split(the_birds, the_birds$Band), function(x) {
              data.frame(Band = x$Band[1],
                         TimeSeq = seq(min(x$DateTime), max(x$DateTime), by = 150))
            })
  )

# For sake of testing. base stuff messes around with rownames, tidyverse adds
# additional tibbly classes.
rownames(base_regular_birds) <- NULL 
identical(base_regular_birds, as.data.frame(regular_birds)) # TRUE

To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.



--

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.

DAYOU Olivier

unread,
Jul 10, 2018, 11:19:14 AM7/10/18
to davi...@googlegroups.com
Hi, everyone,

Does anyone know how to assign significance letters (a,b,c) from Tukey test output to box plot and a heatmap?

So that the box plot with error bar will be having a, b, c to show the significant different between treatment. Same on a heatmap.

Thanks

Brandon Hurr

unread,
Jul 10, 2018, 11:44:09 AM7/10/18
to davi...@googlegroups.com
Olivier, 

Best to start a new subject. 

Here is an example for doing this with ggplot2:

HTH, 
B

DAYOU Olivier

unread,
Jul 10, 2018, 12:16:17 PM7/10/18
to davi...@googlegroups.com
Hi Brandon,

Thank you for your quick reply.

Is there a way to do this on a heatmap as well?
Best Regards,

Olivier DAYOU
M.Sc. Plant Biotechnology
Department of Biochemistry and Biotechnology
Kenyatta University, Nairobi, Kenya
E-mail: olivie...@gmail.com/ dayouol...@students.ku.ac.ke
Skype: Olivier DAYOU
Phone: +254 0797517504
                                   
''Everything you do or do not affects the world''



Corey Clatterbuck

unread,
Jul 10, 2018, 1:30:17 PM7/10/18
to davi...@googlegroups.com
Thanks all, you rock! There were code issues as well as how to set up the data, both of which you addressed. I'll get started on applying this to the dataset & reply if any other issues arise.

Corey

To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.
--

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to a topic in the Google Groups "Davis R Users' Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/davis-rug/cAsFu2TfMjo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to davis-rug+unsubscribe@googlegroups.com.

DAYOU Olivier

unread,
Jul 10, 2018, 2:15:00 PM7/10/18
to davi...@googlegroups.com
I could not get what went wrong with my script. I would like to plot Tukey test output and display the significancce letters. I am getting this error ## Error in strsplit(x, sep) : non-character argument##

Here is my script:

library(plyr)
library(ggplot2)
library(multcompView)

fly<-read.csv("D:/R/fruitfly.csv") #Import fruitfly
fly
tHSD <- TukeyHSD(aov1,"group", ordered = FALSE, conf.level = 0.95)

generate_label_df <- function(HSD, fgroup){
  # Extract labels and factor levels from Tukey post-hoc 
  Tukey.levels <- HSD[[fgroup]][,4]
  Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
  plot.labels <- names(Tukey.labels[['Letters']])
  
  # Get highest quantile for Tukey's 5 number summary and add a bit of space to buffer between    
  # upper quantile and label placement
  boxplot.df <- ddply(fly, fgroup, function (x) max(fivenum(x$y)) + 0.2)
  
  # Create a data frame out of the factor levels and Tukey's homogenous group letters
  plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
                            stringsAsFactors = FALSE)





  # Merge it with the labels
  labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = fgroup, sort = FALSE)
  
  return(labels.df)
}
p_base <- ggplot(fly, aes(x=fly$group, y=fly$longevity)) + geom_boxplot() +
  geom_text(data = generate_label_df(tHSD, 'group'), aes(x = plot.labels, y = V1, label = labels))

To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
--

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to a topic in the Google Groups "Davis R Users' Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/davis-rug/cAsFu2TfMjo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to davis-rug+...@googlegroups.com.

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.


--
fruitfly.csv

Brandon Hurr

unread,
Jul 10, 2018, 2:32:47 PM7/10/18
to davi...@googlegroups.com
Could you either share real or make up some data that is similar to your data that we could work with. 

I'm certain you can do something like that. Below is some ***very*** old code of mine (mostly borrowed from SO posts) that does a correlation matrix. With geom_tile you can do the fill as your "heat" and then use geom_text to place your groupings. 

Hopefully you can see how to apply something like this to your situation?

#Correlation matrix plot
library(tidyverse)
raw.data <- mtcars
colstart = 1
colend = length(names(raw.data))

c<-cor(raw.data[colstart:colend], use= "pairwise.complete.obs")

#Correlation p-value tests
cor.pvalues <- function(X){

    nc <- ncol(X) 
    res <- matrix(0, nc, nc) 
    for (i in 2:nc){
        for (j in 1:(i - 1)){
            res[i, j] <- res[j, i] <- cor.test(X[,i], X[,j])$p.value
            }
        }
    res 
    }

p <- cor.pvalues(raw.data[colstart:colend])

#convert p-values into stars for significance level
stars <- as.character(symnum(p, cutpoints=c(0,0.001,0.01,0.05,1),
                      symbols=c('***', '**', '*', '' ),
                      legend=FALSE,
                      )) 
  
#Combine into data.frame for plotting
molten.p <- reshape2::melt(p)
molten.c <- reshape2::melt(c)
molten.stars <- reshape2::melt(stars)
molten.raw.data <- bind_cols(molten.c, molten.p[3], molten.stars[1]) 

#rename columns to avoid the same name
names(molten.raw.data) <- c("M1", "M2", "corr", "pvalue", "stars")

#define each triangle of the plot matric and the diagonal (mi.ids)
mi.ids <- subset(molten.raw.data, M1 == M2)
mi.lower <- subset(molten.raw.data[lower.tri(c),], M1 != M2)
mi.upper <- subset(molten.raw.data[upper.tri(c),], M1 != M2)
stars.lower <- subset(molten.raw.data[lower.tri(c),], M1 != M2)
meas <- as.character(unique(molten.raw.data$M2))

ggplot(molten.raw.data, aes(M1, M2, fill=corr))+
theme_bw()+
geom_tile(data=mi.lower) +
geom_text(data=mi.lower, aes(label=paste(stars)), size=1) +
geom_text(data=mi.ids, aes(label=M2), hjust=0, colour="grey40", size=1)+
scale_colour_identity() +
scale_fill_gradientn(colours= c("darkblue", "lightblue", "white", "pink", "darkred")) +
scale_x_discrete("", limits=meas[length(meas):1]) + #flip the x axis
scale_y_discrete("", limits=meas)+
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))+
guides(fill=F, colour=F)


Again, super old code, but just for an idea. 

B
Reply all
Reply to author
Forward
0 new messages