Basic Sleep guider multiple sessions per subject bulk run

24 views
Skip to first unread message

Zak Gilliam

unread,
Jul 1, 2025, 2:06:02 AMJul 1
to R package GGIR
Good morning,

My current study stores subject id in a BIDS derived format, specified with example below:
`sub-{ID}_ses-{session number}_accel.csv` 
These are raw ActiGraph files.

I am currently trying to add sleep log by session but current idloc parameters will no let me incorporate the session part in the ID.

I looked at the `extractID.R` function and saw that `idloc=4` uses an `hvar` list of subject numbers - while depracated - is it possible to implement this with my current set up? I could use `idloc = 6` to extract full filename with the appended `_accel` function but this would break our current rigid file storing conventions for LSS.

**Are there any ways i can extract the text preceding the *second '_'* from the filename?**

Below is my current full function so you can see how I am running these files in bulk, (it gets hacky since I have to output as a subdir from raw directory, which GGIR is not a fan of.

### Full GGIR Main
```
#!/usr/bin/env Rscript

# Usage: Rscript new_gg.R --project_dir "/Shared/vosslabhpc/Projects/BOOST/InterventionStudy/3-experiment/data/act-int-test/" --deriv_dir "derivatives/GGIR-3.2.6-test/"
library(optparse)
library(GGIR)

main <- function() {
  # Define the option list
  option_list <- list(
    make_option(c("-p", "--project_dir"), type = "character",
                default = "/mnt/nfs/lss/vosslabhpc/Projects/BOOST/InterventionStudy/3-Experiment/data/act-int-test/",
                help = "Path to the project directory", metavar = "character"),
    make_option(c("-d", "--deriv_dir"), type = "character",
                default = "/derivatives/GGIR-3.2.6-test/",
                help = "Path to the derivatives directory", metavar = "character")
  )

  # Parse the options
  opt_parser <- OptionParser(option_list = option_list)
  opt <- parse_args(opt_parser)

  # Assign variables
  ProjectDir <- opt$project_dir
  ProjectDerivDir <- opt$deriv_dir

  # Print values to verify
  print(paste("Project Directory:", ProjectDir))
  print(paste("Derivatives Directory:", ProjectDerivDir))

  # Helper functions
  SubjectGGIRDeriv <- function(x) {
    a <- dirname(x)
    paste0(ProjectDir, ProjectDerivDir, a)
  }

  datadirname <- function(x) {
    b <- dirname(x)
    paste0(ProjectDir, b)
  }

  # Gather subject directories
  directories <- list.dirs(ProjectDir, recursive = FALSE)
  subdirs <- directories[grepl("sub-*", directories)]
  print(paste("subdirs: ", subdirs))

  # Create project-specific derivatives GGIR folder if it doesn't exist
  if (!dir.exists(paste0(ProjectDir, ProjectDerivDir))) {
    dir.create(paste0(ProjectDir, ProjectDerivDir))
  }

  # List accel.csv files
  filepattern <- "*accel.csv"
  GGIRfiles <- list.files(subdirs, pattern = filepattern, recursive = TRUE,
                          include.dirs = TRUE, full.names = TRUE, no.. = TRUE)
  print(paste("GGIR Files before splitting: ", GGIRfiles))

  # Adjust path formatting
  GGIRfiles <- sapply(strsplit(GGIRfiles, "//", fixed = TRUE), function(x) paste(x[2]))
  print(paste("GGIR Files after splitting: ", GGIRfiles))

  # Ensure directory structure exists
  for (i in GGIRfiles) {
    if (!dir.exists(SubjectGGIRDeriv(i))) {
      dir.create(SubjectGGIRDeriv(i), recursive = TRUE)
    }
  }

  # Run GGIR loop
  for (r in GGIRfiles) {
    if (dir.exists(paste0(SubjectGGIRDeriv(r), "/output_beh"))) {
      next
    } else {
      datadir <- normalizePath(datadirname(r), mustWork = FALSE)
      outputdir <- SubjectGGIRDeriv(r)
      print(paste("datadir: ", datadir))
      print(paste("outputdir: ", outputdir))
      if (!dir.exists(datadir)) {
        stop(paste("Error: datadir does not exist ->", datadir))
      }

      assign("datadir", datadir, envir = .GlobalEnv)
      assign("outputdir", outputdir, envir = .GlobalEnv)

      try({
        GGIR(
          # ==== Initialization ====
          mode = 1:6,
          datadir = datadir,
          outputdir = outputdir,
          studyname = "boost",
          overwrite = FALSE,
          desiredtz = "America/Chicago",
          print.filename = TRUE,
          idloc = 2,

          # ==== Part 1: Data loading and basic signal processing ====
          do.report = c(2, 4, 5, 6),
          epochvalues2csv = TRUE,
          do.ENMO = TRUE,
          acc.metric = "ENMO",
          windowsizes = c(5, 900, 3600),

          # ==== Part 2: Non-wear detection ====
          ignorenonwear = TRUE,

          # ==== Part 3: Sleep detection ====
          # Uncomment the below if using external sleep log:
          # loglocation = "/mnt/nfs/lss/vosslabhpc/Projects/BOOST/InterventionStudy/3-experiment/data/act-int-test/sleep.csv",
          # colid = 1,
          # coln1 = 2,
          # sleepwindowType = "SPT",

          # ==== Part 4: Physical activity summaries ====
          timewindow = c("WW", "MM", "OO"),

          # ==== Part 5: Day-level summaries ====
          hrs.del.start = 4,
          hrs.del.end = 3,
          maxdur = 9,
          threshold.lig = 44.8,
          threshold.mod = 100.6,
          threshold.vig = 428.8,

          # ==== Part 6: CR and other metrics ====
          part6CR = TRUE,
          visualreport = TRUE,
          old_visualreport = FALSE
        )
      })
    }
  }
}

# Run main if executed as script
if (!interactive()) {
  main()
}
```
### Further examples of subject IDs
- `sub-8001_ses-1_accel.csv`, `sub-8001_ses-2_accel.csv`
- `sub-7241_ses-1_accel.csv`


Any help would be awesome, thanks for all the hard work on this package!!

Best,
Zak

Vincent van Hees

unread,
Jul 8, 2025, 3:06:44 AMJul 8
to Zak Gilliam, R package GGIR
Hi Zak,

The options to extract the ID from the filename are indeed limited to splitting the filename based on one character element and treating whatever is before the first occurance of that character as the ID. In theory, we could modify GGIR to facilitate more filename patterns, but if you want me to implement this then I would have to ask you to pay for my time to do the work.

An alternative solution may be:
  • Using idloc = 1 such that ID equals the complete filename
  • Update the sleeplog ID column to also be the accelerometer filename.
  • When GGIR has generated all the output you would have to convert the ID columns in all the output files, which are now the same as the filename columns, into a new column that contain the ID and session number in a format that your pipeline likes. 

Could that be a solution for you?

Best, Vincent

Dr. Vincent van Hees | Independent consultant | https://accelting.com/
image

--
You received this message because you are subscribed to the Google Groups "R package GGIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to RpackageGGIR...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/RpackageGGIR/1c526c47-9d55-4789-9fc4-fa4545e7e968n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages