Aggregating daily rainfall data

10 views
Skip to first unread message

Alyssa DeVincentis

unread,
Jun 30, 2016, 11:21:11 AM6/30/16
to Davis R Users' Group
Hi- I am brand new to R so forgive my inexperience. I have daily rainfall data for 45 years in the form below (value = rainfall amt)

Ano Mes Dia value       date
1 1967   8   1    NA 1967-08-01
2 1967   9   1  11.0 1967-09-01
3 1967  10   1   0.0 1967-10-01
4 1968   5   1  16.2 1968-05-01
5 1968   6   1   0.1 1968-06-01
6 1968   7   1  12.1 1968-07-01


I need to do several things
1) aggregate the data and sum values just from April-November for every year
2) calculate the # of days that contribute to 50% of total rainfall for every year

Any and all insight, feedback, recommendations for what packages to use would be greatly appreciated!

Alyss

Brandon Hurr

unread,
Jun 30, 2016, 11:28:02 AM6/30/16
to davi...@googlegroups.com
I would load lubridate and dplyr. 

Use mutate to create a year and month column from your existing date column. 
library(lubridate)
library(dplyr)

mutate(dataframename, month = month(ymd(date)), year = year(ymd(date))) %>%
filter(month > 3, month < 12) %>%
group_by(year) %>%
summarise(rainfallAMT = sum(value, na.rm=TRUE))

Something like that... 

Send a dput() sample of your real dataset and we can know for sure if that works. 

B

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Duncan Temple Lang

unread,
Jun 30, 2016, 11:44:05 AM6/30/16
to davi...@googlegroups.com
Hi Alyssa

Several of the D-RUG people are at the useR! conference so are probably busy.

1) There are many ways and packages to help you do this. You may want to use packages such as dplyr.
However, the core R facilities are good to master, also.
Since you already have month as a separate column from the date, you don't have to deal with the
date column.

# read the data into a data frame.
d = read.table("Alyssa.dat", stringsAsFactors = FALSE)

# Subset the rows corresponding to April-November inclusive.
tmp = d[ d$Mes >= 4 & d$Mes <= 11, ]

# Group the value observations in tmp by year and for each of these
# call sum() to add up all the elements.
tapply(tmp$value, tmp$Ano, sum)
1967 1968
NA 28.4

This gives a missing value for 1967. If we want to omit these
when performing the sum(),

tapply(tmp$value, tmp$Ano, sum, na.rm = TRUE)
1967 1968
11.0 28.4

Please let me know if I've made a mistake or misunderstood the task.
And I see Brandon has just given you the dplyr version.

D.
> --
> Check out our R resources at http://d-rug.github.io/
> ---
> You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com <mailto:davis-rug+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/davis-rug.
> For more options, visit https://groups.google.com/d/optout.

--
Director, Data Sciences Initiative, UC Davis
Professor, Dept. of Statistics, UC Davis

http://datascience.ucdavis.edu
http://www.stat.ucdavis.edu/~duncan

Duncan Temple Lang

unread,
Jun 30, 2016, 11:54:00 AM6/30/16
to davi...@googlegroups.com
Hi Alyssa again

As for question 2), the following should be "close" to what you want.

numDays50 =
function(values)
{
sv = sort(values)/sum(values)
sum( cumsum(sv) >= .5 )
}

tapply(d$value, d$Ano, numDays50)


D.

On 6/30/16 8:21 AM, Alyssa DeVincentis wrote:
Reply all
Reply to author
Forward
0 new messages