data_cleaning_file not removing nights

Kelsey Sewell

unread,

May 17, 2024, 9:17:36 AMMay 17

to R package GGIR

Hi Dr. van Hees & Team,

We are trying to apply the data_cleaning_file parameter to exclude specific, manually identified, problematic sleep nights and days from outputs part 4 and part 5. We have followed instructions in the vignette for setting up the .csv file, i.e., columns “ID”, “day_part5”, “night_part4”, with separate rows for each combination of ID and night. However, when we run this with our code (pasted below), the problematic nights remain in the night and day-level output in “part4_nightsummary_sleep_ cleaned” and “part5_daysummary”.

We would greatly appreciate your help if you can see where we might be going wrong here.

Thank you so much,

Kelsey & Audrey.

GGIR(

mode=c(1:5),

#=====================

# General + Part 1

#=====================

datadir="",

outputdir="”,

desiredtz = "America/New_York",

overwrite = TRUE,

print.filename = TRUE,

do.parallel = TRUE,

#=====================

# Part 2

#=====================

idloc = 2,

strategy = 1,

maxdur = 0,

ilevels = seq(0,600,by=25),

iglevels = c(seq(0,4000,by=25),8000),

qlevels = c(960/1440, 1320/1440, 1380/1440, 1410/1440, 1430/1440, 1435/1440, 1438/1440),

mvpathreshold =c(100),

printsummary = TRUE,

do.part2.pdf = TRUE,

epochvalues2csv = TRUE,

winhr = c(5,10),

cosinor = TRUE,

#=====================

# Part 3+4

#=====================

do.part3.pdf = TRUE,

outliers.only = FALSE ,

#=====================

# Part 5

#=====================

threshold.lig = c(35), threshold.mod = c(100), threshold.vig = c(430),

boutcriter = 0.8, boutcriter.in = 0.9, boutcriter.lig = 0.8,

boutcriter.mvpa = 0.8, boutdur.in = c(10,20,30,60), boutdur.lig = c(1,5,10),

boutdur.mvpa = c(5,10),

includedaycrit.part5 = 2/3,

frag.metrics="all",

part5_agg2_60seconds = TRUE,

week_weekend_aggregate.part5 = TRUE,

#=====================

# Visual Report

#=====================

timewindow = c("MM", "WW"),

do.report=c(2,4,5),

visualreport = TRUE,

#=====================

# QC

#=====================

data_cleaning_file = "file path/ID_nights.csv"

)

Jairo Hidalgo Migueles

unread,

May 20, 2024, 3:32:04 AMMay 20

to R package GGIR

Hi,

I have just tried the data_cleaning_file functionality in the latest GGIR version in CRAN (version 3.1-0) and it works as expected. Can you double check the following?

ID in the data cleaning file exactly matches the ID extracted by GGIR. The ID extracted by GGIR is controlled with parameter idloc, and you can check this by opening for example a report from part 2 and looking at the ID column.
Double-check the night and day definition, it should be a number matching with the "night" column in the part4_nightsummary report and with the "window_number" column in the part5_daysummary report, respectively.

Best,
Jairo

Reagan Moffit

unread,

May 21, 2024, 2:33:53 PMMay 21

to R package GGIR

Hello,

I have run into the same problem as the person above. In my data cleaning file, we are excluding both activity days and nights. We have tested the data cleaning file in a smaller sample and it worked as expected. Interestingly, in part 5, we have the correct days of physical activity data (i.e. we wanted specifically Days 2-8). However, we have run into issues with the part 4 sleep data output - Our goal was to keep nights 1-7 for sleep analysis. Out of 833 gt3x files, 89 had >7 nights of activity, which tells me that GGIR is not applying the cleaning file to some participants. I have double-checked the data cleaning file to ensure the IDs are correct and have double checked that the columns are named appropriately.

I used GGIR Version 3.0.10.

GGIR(mode = c(1:5),
datadir = "C:/Users/.../gt3x files/Baseline/Cleaned",
outputdir = "C:/Users/.....",

desiredtz = "America/New_York",
overwrite = TRUE,

chunksize=0.4, #Makes processing easier
print.filename = TRUE,
do.parallel = TRUE, #Makes processing easier

#=====================
# Part 2
#=====================

strategy = 1,
data_cleaning_file = "C:/Users/..../data_cleaning_file_final.csv",
includedaycrit = c(17, 17),
mvpathreshold = c(60),

printsummary = TRUE,
do.part2.pdf = TRUE,
epochvalues2csv = TRUE,

myfun = myfun, #Step count function

cosinor = TRUE,
#=====================
# Part 3 + 4
#=====================
do.part3.pdf = TRUE,

ignorenonwear = TRUE,
loglocation = "C:/Users/.....",
colid = 1,
coln1 = 2,
sleepwindowType = "TimeInBed",
HASIB.algo = "vanHees2015",
Sadeh_axis = "N",

#=====================
# Part 5
#=====================

part5_agg2_60seconds = TRUE,
frag.metrics = "all",
#includedaycrit.part5 = 10,
threshold.lig = c(18),
threshold.mod = c(60),
threshold.vig = c(400),
save_ms5rawlevels = TRUE,
save_ms5raw_format = "csv",
timewindow = "MM"
)

Vincent van Hees

unread,

Jun 4, 2024, 3:07:33 AMJun 4

to Reagan Moffit, R package GGIR

Hi Reagan,

I see, the night_part4 column is not working well, it seems to only work for numeric participant IDs and not when it is character. I will have a look at it.

By the way, the data cleaning file is not designed to maximise the number of night in the part 4 output. I would use instead parameter max_calendar_dur = 8 as that will ensure that only the nights in between the first 8 days are included.

Best,

Vincent

--
You received this message because you are subscribed to the Google Groups "R package GGIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to RpackageGGIR...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/RpackageGGIR/4c1274e9-4da9-4324-8805-a2d13602fb08n%40googlegroups.com.

Vincent van Hees

unread,

Jun 4, 2024, 6:23:30 AMJun 4

to Reagan Moffit, R package GGIR

Hi Reagan,

This should be fixed now. To use the fix install GGIR via:

remotes::install_github("wadpac/GGIR")

The issue was not with the ID number but more general.

I will soon create a new CRAN release with this fix included.

Thanks, Vincent

Kelsey Sewell

unread,

Jun 6, 2024, 5:53:05 AMJun 6

to Vincent van Hees, R package GGIR

Dear Vincent and Jairo,

Thank you so much for your helpful feedback and making the bug update. We downloaded the updated version of GGIR and it ran as expected. However, we have already processed 7 timepoints of data (each with 7 days of data) for the same study, so we would prefer to use our current version of GGIR (2.10.1) to avoid having to reprocess them all if we used the updated GGIR version (3.1.1). We are wondering if there's any way we can make a workaround for this issue while using GGIR version 2.10.1? I've included some additional information below summarising what we have already tried.

We see on the vignette that the data_cleaning_file is coded as a character vector, however, as Vincent mentions above the function only seems to work with numeric IDs. We tried reading in our .csv file first and making the ID column numeric, but the data_cleaning_file function gave the following error "cleaning argument data_cleaning_file is not character". Thus, there seems to be a mismatch between the type of data required by the function and the type of data that actually works using the function. As mentioned above, we recognise this is likely not only an issue with just the ID number but is more general.

Any help you can provide is greatly appreciated. We acknowledge that this is a process inherent with using open source software and appreciate the efforts with the GGIR version updates!

Best wishes,

Kelsey & Audrey

You received this message because you are subscribed to a topic in the Google Groups "R package GGIR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/RpackageGGIR/VEuH-CgrGas/unsubscribe.
To unsubscribe from this group and all its topics, send an email to RpackageGGIR...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/RpackageGGIR/UygY8eiitGmcI1atVKyYLyfAPQLEPIsodlU9Gl_O7QgkgV6dOtTIqO46Al-6E3fcgtuS5-kt-pKgL2qc7ZPuEc7TeqPg0zuZuwwC9q--6XA%3D%40accelting.com.

Vincent van Hees

unread,

Jun 6, 2024, 10:52:42 AMJun 6

to Kelsey Sewell, R package GGIR

Hi Kelsey and Audrey,

However, we have already processed 7 timepoints of data (each with 7 days of data) for the same study, so we would prefer to use our current version of GGIR (2.10.1) to avoid having to reprocess them all if we used the updated GGIR version (3.1.1)...

I see three options:

Keep previously generated part 1 output and keep using 2.10.1 for future part 1 processing, but re-process parts 2 onward with the latest GGIR version.
Use the part 4 night level full summary report (csv) procuded by 2.10.1, remove the nights you do not want including nights that do not meet your inclusion criteria, and then aggregate them per person yourself (in R, Stata).
Dive into the part 4 milestone files (RData), remove the nights you want to delete from those objects and save the remaining data in the same file again.

I image each one of these has their own pros and cons, but hopefully one of them is feasible for you.

We see on the vignette that the data_cleaning_file is coded as a character vector, however, as Vincent mentions above the function only seems to work with numeric IDs.

That was my first guess, but the actual problem was not related to the ID.

Best,

Vincent

Kelsey Sewell

unread,

Jun 12, 2024, 1:20:01 PMJun 12

to Vincent van Hees, R package GGIR

Hi Vincent,

Thank you so much for the helpful response. We trialled solution 3 and it worked! This is very helpful and has saved us a lot of time. Thanks again for your help.

Best,

Kelsey & Audrey

Reagan Moffit

unread,

Jun 13, 2024, 10:56:43 AMJun 13

to Kelsey Sewell, Vincent van Hees, R package GGIR

Hi Vincent,

I also wanted to let you know that my error was fixed as well. Thank you so much for helping with this! We really appreciate your commitment to GGIR - it is invaluable for our work!

Reagan

To view this discussion on the web, visit https://groups.google.com/d/msgid/RpackageGGIR/CABXKjPTH_fdLVt-ChQ3TC8yOoYqTBi1ADWn7iteMC%2BwC2xqPbA%40mail.gmail.com.

Reply all

Reply to author

Forward