calculating average time of day using circular statistics

1,061 views
Skip to first unread message

Bonnie Dixon

unread,
Oct 6, 2013, 8:15:06 PM10/6/13
to davi...@googlegroups.com
Fellow R users;

Is anyone familiar with circular statistics?  

I have some data on the time of day when people went to bed over several nights of sleep (in decimal hours on a 24 hour clock).  Here is an example:

> sleep <- data.frame(person.id = c(117, 117, 117, 208, 265, 265, 327, 327, 327, 327), bedtime = c(23.2, NA, 21.5, NA, 0.7, 1.3, 23.6, 0.5, 22.9, 1.1))
> sleep
   person.id bedtime
1        117    23.2
2        117      NA
3        117    21.5
4        208      NA
5        265     0.7
6        265     1.3
7        327    23.6
8        327     0.5
9        327    22.9
10       327     1.1

I would like to calculate each person's average bedtime.  Doing this the same way that I would normally calculate an average does not produce meaningful results because time of day is a circular scale (similar to angles or compass bearings), and sometimes people go to bed before midnight, other times, after.

> tapply(sleep$bedtime, sleep$person.id, mean, na.rm = TRUE)
   117    208    265    327 
22.350    NaN  1.000 12.025 

Does anyone know how to calculate an average time of day?  (I found this webpage, which addresses the topic, but does not provide an example of how to do it in R. I also got the package, circular, which provides functions for circular statistics, but I'm struggling with figuring out how to use it for this operation.)  Thanks for any ideas.

Bonnie

Michael Hannon

unread,
Oct 6, 2013, 9:46:06 PM10/6/13
to davi...@googlegroups.com
Hi, Bonnie.  Can you show me (us) what you would LIKE the output in your example to be?  It appears that, for instance, your person No. 327 DID go to bed at slightly after midnight, on the average, just as your tapply, correctly interpreted, says.

-- Mike

Bonnie Dixon

unread,
Oct 6, 2013, 10:17:40 PM10/6/13
to davi...@googlegroups.com
Hi Mike.  Thanks for the reply.  I am using 24 hour clock times in decimal hours for this example, so the output I was expecting would be something like:

22.350     NA     1.000     0.025        

Bonnie


On Sun, Oct 6, 2013 at 6:46 PM, Michael Hannon <jmhannon...@gmail.com> wrote:
Hi, Bonnie.  Can you show me (us) what you would LIKE the output in your example to be?  It appears that, for instance, your person No. 327 DID go to bed at slightly after midnight, on the average, just as your tapply, correctly interpreted, says.

-- Mike

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/groups/opt_out.

Bonnie Dixon

unread,
Oct 6, 2013, 11:15:08 PM10/6/13
to davi...@googlegroups.com
I found a way to do it.  The package, psych, has a function, circadian.mean() for this purpose:

> mean.bedtimes <- tapply(sleep$bedtime, sleep$person.id, function(x) {circadian.mean(as.numeric(na.omit(x)))})
> mean.bedtimes
        117         208         265         327 
22.35000000         NaN  1.00000000  0.02543867 
> mean.bedtimes[which(is.nan(mean.bedtimes))] <- NA
> mean.bedtimes
        117         208         265         327 
22.35000000          NA  1.00000000  0.02543867 

Now I just need a way to calculate the standard deviation, or some measure of spread.

Bonnie

Bonnie Dixon

unread,
Oct 7, 2013, 1:16:14 PM10/7/13
to davi...@googlegroups.com
And here is a way of calculating the standard deviation, which I eventually found using the package, circular.

> require(circular)
> bedtime.circ <- circular(sleep$bedtime, units = "hours", template = "clock24")
> tapply(bedtime.circ, sleep$person.id, sd, na.rm = TRUE)
       117        208        265        327 
0.22345815         NA 0.07858025 0.22077765 


Bonnie

Michael Hannon

unread,
Oct 7, 2013, 6:38:54 PM10/7/13
to davi...@googlegroups.com
Hi, Bonnie.  I have a couple of comments about this.

First comment: if I assume that a bedtime less than 12 really means a late bedtime, I can get your average values just by adding in 24 at the appropriate places and modding it out at the end:

> zzz <- sleep
> zzz$bedtime <- ifelse(sleep$bedtime <= 12, sleep$bedtime + 24, sleep$bedtime)
> zzz

person.id bedtime
1 117 23.2
2 117 NA
3 117 21.5
4 208 NA
5 265 24.7
6 265 25.3
7 327 23.6
8 327 24.5
9 327 22.9
10 327 25.1
> zzzAve <- by(zzz$bedtime, zzz$person.id, mean, na.rm=TRUE) %% 24
> zzzAve
zzz$person.id: 117
[1] 22.35
------------------------------------------------------------------------
zzz$person.id: 208
[1] NaN
------------------------------------------------------------------------
zzz$person.id: 265
[1] 1
------------------------------------------------------------------------
zzz$person.id: 327
[1] 0.025

Even if this is true, it probably doesn't help you, as you seem to have solved the problem, but I thought it was interesting.

Second comment: I don't understand your sd values.  For person 117, for instance, you have:

    sd = 0.22345815

But the bedtimes for person 117 are:

    23.2
    21.5

(with the NA removed).  The mean for these two values is:

    22.35

If I calculate the sample sd from what I understand to be the definition, I get:

> sqrt((23.2 - 22.35)^2 + (21.5 - 22.35)^2 / (2 - 1))
[1] 1.202082

I am evidently still confused about the procedure.  What am I missing?

-- Mike

Bonnie Dixon

unread,
Oct 8, 2013, 3:41:57 PM10/8/13
to davi...@googlegroups.com
Hi Mike.

Thanks for pointing out that my standard deviation calculations appear to be incorrect.  I looked at the documentation for the "circular" package, which I used to calculate those, and the formula it uses for standard deviation is:
Inline image 2
where R bar is the mean resultant length of the data.  The wikipedia page on directional statistics provides some more explanation, but I am still not clear on what "resultant length" is or why this formula is used, and I'm not sure whether the results are supposed to be the same or different from the results of a regular calculation of standard deviation, like the one you show.  Maybe, for now, I will not calculate a standard deviation for variables that are times of day, since I do not have a way of calculating it that I am confident in.

As for your alternative approach to calculating the mean time of day, that works for bedtimes around midnight, but it works by shifting the "cut-off time" to noon.  So that way of calculating it would not work for averaging times around noon.  Here is an example in which I add person 380 to represent a shift worker who goes to bed in the middle of the day.

> sleep <- data.frame(person.id = c(117, 117, 117, 208, 265, 265, 327, 327, 327, 327, 380, 380), bedtime = c(23.2, NA, 21.5, NA, 0.7, 1.3, 23.6, 0.5, 22.9, 1.1, 11.3, 12.3))
> sleep
   person.id bedtime
1        117    23.2
2        117      NA
3        117    21.5
4        208      NA
5        265     0.7
6        265     1.3
7        327    23.6
8        327     0.5
9        327    22.9
10       327     1.1
11       380    11.3
12       380    12.3
> sleep$bedtime24 <- ifelse(sleep$bedtime <= 12, sleep$bedtime + 24, sleep$bedtime)
> sleep
   person.id bedtime bedtime24
1        117    23.2      23.2
2        117      NA        NA
3        117    21.5      21.5
4        208      NA        NA
5        265     0.7      24.7
6        265     1.3      25.3
7        327    23.6      23.6
8        327     0.5      24.5
9        327    22.9      22.9
10       327     1.1      25.1
11       380    11.3      35.3
12       380    12.3      12.3
> tapply(sleep$bedtime24, sleep$person.id, mean, na.rm = TRUE)%%24
   117    208    265    327    380 
22.350    NaN  1.000  0.025 23.800 

I'm glad I found the circadian.mean() function in the package, "psych", since it seems to calculate the mean time of day correctly in all cases.

Bonnie
image.png
Reply all
Reply to author
Forward
0 new messages