part5_segments - some inconsistencies in the output

268 views
Skip to first unread message

marcuslop...@gmail.com

unread,
Oct 10, 2023, 2:40:39 PM10/10/23
to R package GGIR
Hello GGIR Team,

Thank you for including the segmented output in part 5. That's a great feature.

I was inspecting the part5_segments output and found some inconsistencies to share with you. The attached figure is a print from the part 5_daysummary_segments of a single participant. The config file is attached.

Screenshot_14.jpg

The output for segment 10 provides the results I was expecting to see. The sum of all dur_day_total_X variables is equal to dur_day_min (i.e., the segment length in this case). However, this was not observed for several other segments as follows:

1. Segment 11 should have 55 minutes of data. But the dur_day_min and dur_spt_min variables are fixed at 0.083 and zero, respectively. This was observed for segments with other start_end_windows (i.e., 13:30:00-14:24:54; 15:20:00-15:39:54; 15:39:59-16:34:54; 07:30:00-08:24:54; 09:20:00-09:39:54, 12:30:00-12:29:55).

2. Segment 12 should also have 55 minutes of data. But the dur_day_min value is higher than the segment length. This was observed for segments with other start_end_windows (i.e., 14:24:59-15:19:55; 16:34:59-17:29:55; 08:24:59-09:19:55, 09:39:59-10:34:55).

I hope this information may be useful to improve the feature.
Thank you for your support.

Best,
Marcus

config.csv

Jairo Hidalgo Migueles

unread,
Oct 11, 2023, 2:03:47 AM10/11/23
to R package GGIR
Hi Marcus,

Thanks for your question. I suspect the reason why you don't get the expected output is that your defined windows are in a different resolution than your epoch size. You used windowsizes = c(5, 900, 3600), i.e., 5-second epoch size, so the timestamps in your time series would have a 5-second resolution (e.g., 10:00:00, 10:00:05, 10:00:10, .....). However some of your windows are not multiple of 5 seconds, for example, the 14:24:54 timestamp is not available in your time series, and therefore GGIR struggles to define the window.

Can you revise your qwindow argument so that it matches with your current epoch length and check if that resolves your issue?

Best,
Jairo 

marcuslop...@gmail.com

unread,
Oct 11, 2023, 9:02:05 AM10/11/23
to R package GGIR
Hi Jairo,

Thank you for your answer.
All the segments that I include should have a 5-sec resolution as you can see in the following vector:

seg_timestamp <-  c(
    "07:00", "08:00", "08:55", "09:50",
    "10:10", "11:05", "12:00", "12:30",
    "13:00", "13:30", "14:25", "15:20",
    "15:40", "16:35", "17:30", "18:30",
    "23:00")


Then I converted the timestamps to an hourly scale using the following function: 

seg_hours <- period_to_seconds(hm(seg_timestamp))/3600

The problem may be that some timestamps would be repeated decimals and the rounding may not fit the epoch window (e.g., 15:40 = 15.666667). This kind of problem would be frequent for several segments that use multiples of 5 minutes. Do you have any suggestion?

Best,
Marcus

marcuslop...@gmail.com

unread,
Oct 11, 2023, 9:32:16 AM10/11/23
to R package GGIR
Adding some information.
This issue with the segmentation is not observed in part2 output. The estimates are according to expected there.

Vincent van Hees

unread,
Oct 11, 2023, 9:51:52 AM10/11/23
to marcuslop...@gmail.com, R package GGIR
I see you left argument segmentDAYSPTcrit.part5 at its default value of c(0,0)

This is completely meaningless and will bias your results.
You need to look up the documentation and change the value based on your research interests.

Vincent

------- Original Message -------
--
You received this message because you are subscribed to the Google Groups "R package GGIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to RpackageGGIR...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/RpackageGGIR/940c5b36-b4c5-40fd-881c-3c404ffbe01fn%40googlegroups.com.

Jairo Hidalgo Migueles

unread,
Oct 11, 2023, 10:17:08 AM10/11/23
to R package GGIR
Thanks Marcus and Vincent,

Apart from Vincent's input, I think you still have an issue with the time resolution. Defining qwindow as decimal numbers when you need a high precision in the minutes bring this problem. 

For example, using your method and converting back the data to timestamps, you can see we don't get exactly the timestamps we expect:

library(lubridate)

seg_timestamp <-  c(
  "07:00", "08:00", "08:55", "09:50",
  "10:10", "11:05", "12:00", "12:30",
  "13:00", "13:30", "14:25", "15:20",
  "15:40", "16:35", "17:30", "18:30",
  "23:00")
seg_hours = period_to_seconds(hm(seg_timestamp))/3600
print(seg_hours)
back_conversion = seconds_to_period(seg_hours*3600)
print(back_conversion)
[1] "7H 0M 0S" "8H 0M 0S" [3] "8H 54M 59.9999999999964S" "9H 50M 0S" [5] "10H 10M 0S" "11H 5M 0S" [7] "12H 0M 0S" "12H 30M 0S" [9] "13H 0M 0S" "13H 30M 0S" [11] "14H 25M 0S" "15H 20M 0S" [13] "15H 40M 0S" "16H 34M 59.9999999999927S" [15] "17H 30M 0S" "18H 30M 0S" [17] "23H 0M 0S"

The reason it works in part 2 is that there we already implemented a check in the past a check for the time resolution and we use the closest epoch in the case that the resolution between epoch length and qwindow does not match. However I think this is not implemented in part 5 yet as this functionality is way more recent. 

As a quick solution for you, I would quickly build up an activity log and use the path to the activity log as input for the qwindow argument, I think GGIR will then be able to handle those timestamps. The activity log should look like this:

ID             t1                t2                t3                t4                t5                t6                 ......
ID01        07:00:00    08:00:00    08:55:00    09:50:00    10:10:00    11:05:00    
ID02        07:00:00    08:00:00    08:55:00    09:50:00    10:10:00    11:05:00    
ID03        07:00:00    08:00:00    08:55:00    09:50:00    10:10:00    11:05:00    
ID04        07:00:00    08:00:00    08:55:00    09:50:00    10:10:00    11:05:00    
.....

Best,
Jairo

marcuslop...@gmail.com

unread,
Oct 11, 2023, 10:49:42 AM10/11/23
to R package GGIR
Hi Vincent,

Thank you for your answer.

I left the segmentDAYSPTcrit.part5 at its default on porpuse to check if the segmented output would include the fraction of the sleep time window or if SPT would be classified as awake IN. This is not clear to me in the documentation.

Thus, assuming that the segment 6:00-12:00 includes 3 hours of IN for a given day, being 2h of these part of the sleep time window and 1h part of the awake period. How exactly the default  segmentDAYSPTcrit.part5 would handle this? I was expecting to see 120 minutes of dur_spt_min and 60 minutes of dur_day_total_IN_min, which makes sense to me as unbiased estimates. But since you have warned about the bias in your answer and in the documentation, this is probably not how GGIR is doing the classification. I appreciate if you can clarify what is being performed.

Regards,
Marcus

marcuslop...@gmail.com

unread,
Oct 11, 2023, 11:35:28 AM10/11/23
to R package GGIR
Hi Jairo,

I tought this check was implemented in part5 as well. Thank you for sharing.

I will try using an activity log as suggested. It may be a basic R question, but is there a way to specify a single line of segments in the activity log to be applied to any/all IDs?

Vincent van Hees

unread,
Oct 11, 2023, 11:59:28 AM10/11/23
to marcuslop...@gmail.com, R package GGIR
I left the segmentDAYSPTcrit.part5 at its default on porpuse to check if the segmented output would include the fraction of the sleep time window or if SPT would be classified as awake IN. This is not clear to me in the documentation.

I am aware now that this is indeed not clearly documented. I plan to revise documentation and default value before the next release.

Vincent

------- Original Message -------

Jairo Hidalgo Migueles

unread,
Oct 12, 2023, 4:31:30 AM10/12/23
to R package GGIR
Here a basic script to generate an activity log with the characteristics you need, hope this helps.

# create vector containing your IDs
IDs = paste0("ID0", 1:9)

# qwindow splits

seg_timestamp <-  c(
  "07:00", "08:00", "08:55", "09:50",
  "10:10", "11:05", "12:00", "12:30",
  "13:00", "13:30", "14:25", "15:20",
  "15:40", "16:35", "17:30", "18:30",
  "23:00")

# activitylog structure
activitylog = as.data.frame (matrix(data = NA, nrow = length(IDs),
                                    ncol = length(seg_timestamp) + 1))
colnames(activitylog) = c("ID", paste0("t", 1:length(seg_timestamp)))

# Fill activity log
activitylog$ID = IDs
activitylog[, 2:ncol(activitylog)] = rep(seg_timestamp, each = nrow(activitylog))
print(activitylog)

#         ID       t1        t2        t3       t4       t5       t6        t7       t8        t9     t10      t11     t12     t13     t14     t15     t16     t17
# 1 ID01 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 2 ID02 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 3 ID03 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 4 ID04 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 5 ID05 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 6 ID06 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 7 ID07 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 8 ID08 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00
# 9 ID09 07:00 08:00 08:55 09:50 10:10 11:05 12:00 12:30 13:00 13:30 14:25 15:20 15:40 16:35 17:30 18:30 23:00

# Store it for later use in GGIR (adapt the path as you need)
write.csv(activitylog, "C:/myactivitylog.csv", row.names = FALSE)
Message has been deleted

marcuslop...@gmail.com

unread,
Oct 16, 2023, 10:34:02 AM10/16/23
to R package GGIR
Hi Jairo,

Thank you for providing the sample codes!  I will try it.

Best,
Marcus

marcuslop...@gmail.com

unread,
Oct 16, 2023, 10:36:41 AM10/16/23
to R package GGIR
Hi Vincent,

Could you provide a brief explanation about how GGIR do the part5 segment classification described in the example from my previous comment?

"I left the segmentDAYSPTcrit.part5 at its default on porpuse to check if the segmented output would include the fraction of the sleep time window or if SPT would be classified as awake IN. This is not clear to me in the documentation.

Thus, assuming that the segment 6:00-12:00 includes 3 hours of IN for a given day, being 2h of these part of the sleep time window and 1h part of the awake period. How exactly the default  segmentDAYSPTcrit.part5 would handle this? I was expecting to see 120 minutes of dur_spt_min and 60 minutes of dur_day_total_IN_min, which makes sense to me as unbiased estimates. But since you have warned about the bias in your answer and in the documentation, this is probably not how GGIR is doing the classification. I appreciate if you can clarify what is being performed."

I am aware that you will revise the documentation, but I appreciate if you could provide a brief explanation.

Regards,
Marcus

Vincent van Hees

unread,
Oct 22, 2023, 2:56:08 PM10/22/23
to marcuslop...@gmail.com, R package GGIR
See updated documentation for argument "segmentDAYSPTcrit.part5" in https://cran.r-project.org/web/packages/GGIR/vignettes/GGIRParameters.html

Best, Vincent

------- Original Message -------

On Monday, October 16th, 2023 at 4:22 PM, marcuslop...@gmail.com <marcuslop...@gmail.com> wrote:

Hi Jairo,

Thank you for providing the sample codes! I will try it.

Could you or Vincent provide a brief explanation about how GGIR do this part5 classification


Regards,
Marcus

marcuslop...@gmail.com

unread,
Nov 1, 2023, 3:44:22 PM11/1/23
to R package GGIR
Hi Vincent,

Thank you for the update. I still have a question.

The updated vignette includes:
Setting both to zero would be problematic and is not allowed as that would introduce bias in behavioural estimates for the following reason: A complete segment would be averaged with an incomplete segments (someone going to bed or waking up in the middle of a segment) by which it is no longer clear whether the person is less active or sleeps more during that segment.

Is sleep not distinguished from wake inactivity in the segmented output? I assume that the sum of dur_spt_min and dur_day_min for a given day should equal the segment length (e.g., 4 hours, from 8 AM to 12 PM). If this is true, I'm curious as to why configuring the segmentDAYSPTcrit.part5 argument as "c(0,0)" would introduce bias. For instance, if an individual was asleep between 8 AM and 10 AM in a segment defined from 8 AM to 12 PM, bias would be introduced only if this 2-hour interval is categorized as wake inactivity.

Regards,
Marcus

Vincent van Hees

unread,
Nov 3, 2023, 11:07:59 AM11/3/23
to marcuslop...@gmail.com, R package GGIR
Hi Marcus,

Is sleep not distinguished from wake inactivity in the segmented output? 
It is, but the problem is that you cannot compare time spent in inactivity based on a 4 hour period with time spent in inactivity based on 2 hour period. As a result you cannot conduct meaningful comparisons between days or between persons.

For example, imagine you want to assess whether boys spent more time in inactivity between 8am-12pm than girls. If we would include time windows where sleep occurred then it is no longer clear whether boys truly spent more time in inactivity or whether girls simply woke up later and as a result of that had less time in inactivity. To address this we first need to make sure that sleep can not confound the comparisons, by excluding all days where the person woke up inside the 8-12 window.

I did this because I am worried that otherwise many publications will appear with 'comparisons' of time spent in MVPA or inactivity per day segment without considering the confounding role of sleep.

If you know an alternative solution to this then I would be interested to hear.

Regards.
Vincent

Dr. Vincent van Hees | Independent consultant | https://accelting.com/
image

marcuslop...@gmail.com

unread,
Nov 16, 2023, 7:32:37 AM11/16/23
to R package GGIR
Hi Vincent,

The concern is entirely valid when focusing on behaviors specific to either the wake period or the sleep window, but not both. While this restriction is valid, it may impede a crucial aspect of the part5 output.

From my perspective, a significant advantage of integrating the segment output into part5 is the ability to explore the time-use spectrum beyond the confines of a 24-hour period. For instance, consider a scenario where one aims to compare the morning time-use composition between students whose school starts at 7:30 and those starting at 8:00. In defining the morning segment as the time between sunrise (e.g., 6:00) and lunchtime (e.g., 12:00), both awake and sleep windows become relevant.

Assuming that, in most cases, users are primarily interested in the awake period, you could establish the default setting for segmentDAYSPTcrit.part5 as c(0.9, 0) but also allow users to specify c(0, 0). This would provide flexibility for users to focus exclusively on the awake period while acknowledging the potential importance of the sleep window in certain analyses.

Regards, 
Marcus

Vincent van Hees

unread,
Nov 23, 2023, 5:00:47 AM11/23/23
to marcuslop...@gmail.com, R package GGIR
Hi Marcus,

Thanks for taking the time to explain. Ok, I agree, maybe best solution then is to allow for c(0,0), but keep the current default and to keep warnings in the documentation.  I will now add this to the open issues to work on in the upcoming weeks.

Regards,

Vincent

Reply all
Reply to author
Forward
0 new messages