Empty tabix files from regionCounts

82 views
Skip to first unread message

David Fisher

unread,
Jul 12, 2023, 9:41:07 AM7/12/23
to methylkit_discussion
Hi all,

I have conducted some differential methylation analysis at a base level, and now am looking to do the same at a gene level by aggregating counts of methylated and unmethylated Cs per gene. I did this successfully in the CpG context with the "regionCounts" function, generating one "_regions" tabix file per sequence which I could then filter and unite before using "calculateDiffMeth" etc.

However, when I tried the same with my sequences from the CHG context, I received an error when running regionCounts that one of the tabix file was empty, terminating the function (messages: "compressing the file with bgzip...  making tabix index...  ERROR : the tabix file seems to be empty. stopping here."). I got round this by a) loading each sequence separately b) combining them into a normal list c) writing a function with regionCounts and tryCatch that keeps working if an error arises d) lapply-ing the new function to my list. This revealed that 3 out of 10 sequences created empty tabix files. I could then read in the remaining 7 and continue with filter, unite, and calculateDiffMeth etc, but I do not understand why those 3 samples did not work; they do not appear different in any particular way and worked fine for the base level analysis.

Then, I tried the same with the CHH context sequences, and all 10 of 10 returned empty tabix files. Any idea what could be going on?

Cheers,

David

Alexander Blume

unread,
Jul 12, 2023, 10:10:30 AM7/12/23
to methylkit_discussion

Hi David, 

Do you get any errors when not saving the output of regionCounts as tabix files but as in-memory objects (settings "save.db = FALSE")? It could be possible that some of your samples don't cover the regions that you are querying, these should return empty data.frames. 

Would you be able to create and share a reproducible code example with a small subset of your data?

Best,
Alex 

David Fisher

unread,
Jul 13, 2023, 10:59:09 AM7/13/23
to methylkit_discussion
Hi Alex,

I have tried not saving as tabix file with a single sequence and get a new error:

Uncompressing file.
Reading file.

 *** caught segfault ***
address (nil), cause 'unknown'
*** recursive gc invocation [this line repeated many times]

Traceback:
 1: data.table::fread(file = filepath, ...)
 2: fread.gzipped(tbxFile, nrow = nrow, stringsAsFactors = FALSE,     data.table = returnDt, skip = length(Rsamtools::headerTabix(tbxFile)$header))
 3: headTabix(x@dbpath, nrow = max(i), return.type = "data.frame")
 4: select(x, i)
 5: select(x, i)
 6: .local(x, i, j, ...)
 7: object[]
 8: object[]
 9: regionCounts(chh_db_62, dum_genes_granges, save.db = "FALSE")
10: regionCounts(chh_db_62, dum_genes_granges, save.db = "FALSE")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Same error with a couple of different sequences.

I do not think it can be that the samples do not overlap with the regions of interest at all as I don't see why all 10 of the CpG sequences overlap but none of the CHH sequences do?

Cheers,

David

Alexander Blume

unread,
Oct 9, 2023, 7:38:28 AM10/9/23
to methylkit_discussion
Hi David,

The segfault error indicates we are running out of memory. The traceback shows that the function failed during the file-loading. But anyway, it worked (even though it errored) before with the tabix files.

Concerning the difference between CpG and CHH coverage, I think it greatly depends on the organism/tissue that you are studying. In mammals, cytosines CpG context are known to cluster in CpG-islands on promoter regions of the gene, while non-CpG cytosines occur on repetitive DNA sequences or transposable elements. So maybe your CHH sites are located in intergenic regions? You could check this with genomation (https://bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html#4_Annotating_differentially_methylated_bases_or_regions)  or annotatr (https://bioconductor.org/packages/release/bioc/html/annotatr.html)

Best,
Alex
Reply all
Reply to author
Forward
0 new messages