Help with empty methylKit object for 5hmC analysis from OxRRBS data

Julia Verheul

unread,

Oct 24, 2025, 7:27:37 AMOct 24

to methylkit_discussion

Hello everyone,

I'm hoping to get some guidance with an issue I'm encountering in methylKit. I'm particularly interested in studying 5-hydroxymethylcytosine (5hmC) using OxRRBS data. Here's my workflow and the specific problem:

Background Context:
I have OxRRBS data that I've analyzed with Bismark/Bowtie2 for both BS and OxBS treatments

Following the subtraction approach from Booth et al. (as implemented in https://github.com/VerheulJ/5hmc/blob/main/subtraction, I inferred 5hmC levels by subtraction

I obtained a table of significant hydroxymethylation calls and manually created text files mimicking Bismark cytosine report format to use with methylKit

My file creation approach is detailed here: https://github.com/VerheulJ/5hmc/blob/main/create%20methylraw

Current Problem:
I can run methRead() without errors, but the resulting methylRaw object is completely empty:

# This runs without errors:
tg_path <- "Galeano/methyl_filtered_TG.txt"
wt_path <- "Galeano/methyl_filtered_WT.txt"
sample.id <- list("Tg","WT")
treatment <- c(1, 0)
file.list <- list(tg_path, wt_path)

myobj <- methRead(
location = file.list,
sample.id = sample.id,
assembly = "rn6",
treatment = treatment,
context = "CpG",
pipeline = "bismarkCytosineReport",
mincov = 1) # Even with mincov=1

# But the object is empty:
myobj[[1]][["chr"]]
character(0)

What I've Verified:
Files are formatted with 6 columns: chr, position, strand, count_methylated, count_unmethylated, percentage

No header lines, tab-separated
Coverage values are all ≥10 (I pre-filtered)

About 8,600 positions overlap between samples

Questions:
What could cause methRead() to complete without errors but return empty objects?

Are there common formatting issues with custom-created Bismark-style files that might cause this?
Is there a way to debug what methylKit is actually reading from the files?
Has anyone successfully used methylKit with 5hmC data from subtraction approaches?

I'd be extremely grateful for any insights or suggestions. Thank you so much for your time and expertise!

Best regards,
Julia

Alexander Blume

unread,

Oct 24, 2025, 12:41:05 PMOct 24

to methylkit_...@googlegroups.com

Hi Julia,

I am not particularly familiar with the analysis of 5hmC data, but getting an empty object without error sounds weird.

I would suggest trying to manually follow the steps that methread is taking (journey starts here: https://github.com/al2na/methylKit/blob/master/R%2Fbackbone.R#L703).

Maybe there are some intermediate steps that fail silently and cause your problems.

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/63b32ef3-2239-4f33-b907-a42209eab801n%40googlegroups.com.

Julia Verheul

unread,

Oct 30, 2025, 6:01:28 AMOct 30

to methylkit_discussion

Hi Alex,

I solved the issue – it turned out to be something very simple. I was generating an input file by mimicking the structure of the object produced by methRead, just copying the columns I saw when exploring that object. The problem is that methRead actually transforms the original cytosine report (6 columns) into a 7-column structure. Once I generated an input resembling the Bismark cytosine report, I was able to complete the analysis pipeline.

However, the number of annotations and DhMRs I obtained was extremely low, even after relaxing the thresholds. Because of that, I decided to implement a different strategy, specifically the subtraction approach described in Booth et al., 2012 (DOI: 10.1186/gb-2012-13-8-r69). In this work, they analyze different hydroxymethylation contexts and, to improve statistical power, aggregate by CpG islands (CGIs).

At this point I wonder: is there any way to run a methylKit analysis directly at the CGI level? My 5hmC data has the following structure:

"CGI" "Sample" "Chr" "Start" "End" "n_CpGs" "total_coverage_BS" "total_coverage_OxBS" "CGI_5mC" "CGI_5hmC"

Would it be possible to build some kind of methylKit-compatible input from this table?
And more conceptually: to what extent can a CGI be understood as a "tile" in the methylKit framework?

Thanks a lot for your help!
Julia

Alexander Blume

unread,

Nov 9, 2025, 7:56:02 AMNov 9

to methylkit_...@googlegroups.com

Hi Julia,

After loading the data, you can summarise your object at the CGI level using the regionCounts function (https://www.bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html#41_Regional_analysis). Conceptionally, you can treat those regions as "tiles", even though they would not cover the whole genome, but this would allow you to perform differential analysis at the CGI level.

Best

To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/4b183cbb-6d86-4d23-846f-d123fb3831d5n%40googlegroups.com.

Reply all

Reply to author

Forward