error when trying to do a PCA (from uniting them)

91 views
Skip to first unread message

Loryn Smith

unread,
Aug 14, 2023, 5:50:25 PM8/14/23
to methylkit_discussion
When trying to do a PCA on my samples I am getting an error message: 
Error in svd(x, nu = 0, nv = k) : a dimension is zero

I think it has something to do with the NAs when I unite the samples. I have tried to include a screenshot of a portion of the table when I use the View function. 

It is difficult to include a reproducible example due to my dataset being so large. Each file is over 1.5 GB. I am using RRBS data and I used Bismark prior to putting my samples into R. Could this have something to do with it?

Here is my code:

myobj_E_NE=methRead(file.list_E_NE,
               sample.id=list("2022_M_3744",
                              "2022_M_3738",
                              "2022_M_3740",
                              "2022_M_3734",
                              "2022_M_3735",
                              "2022_M_3737",
                              "2022_L_3744",
                              "2022_L_3738",
                              "2022_L_3740",
                              "2022_L_3734",
                              "2022_L_3735",
                              "2022_L_3737",
                              "2022_M_3744_NE",
                              "2022_M_3738_NE",
                              "2022_M_3740_NE",
                              "2022_M_3734_NE",
                              "2022_M_3735_NE",
                              "2022_M_3737_NE",
                              "2022_L_3744_NE",
                              "2022_L_3738_NE",
                              "2022_L_3740_NE",
                              "2022_L_3734_NE",
                              "2022_L_3735_NE",
                              "2022_L_3737_NE"),
               assembly="pleucopus",
               pipeline="bismarkCytosineReport",
               header=FALSE,
               treatment=c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0),
               context="CpG",
               mincov = 10
)
meth.min_E_NE=unite(filtered.myobj_E_NE,min.per.group=1L,destrand=FALSE)

PCASamples(meth.min_E_NE, screeplot=TRUE)

PCASamples(meth.min_E_NE)
NAs_in_methylKit.png

Loryn Smith

unread,
Aug 14, 2023, 5:52:31 PM8/14/23
to methylkit_discussion
I forgot to include the code where I filtered my data before uniting them.

filtered.myobj_E_NE=filterByCoverage(myobj_E_NE,lo.count=10,lo.perc=NULL,
                                hi.count=NULL,hi.perc=99.9)

Alexander Blume

unread,
Aug 15, 2023, 5:10:20 AM8/15/23
to methylkit_discussion
Hi Loryn,

Actually, NA containing rows are removed before generating the PCA (https://github.com/al2na/methylKit/blob/master/R/clusterSamples.R#L409).
It could be possible that the majority of your sites are omitted after the filtering, thus causing the issues.
See here for some reasons this PCA-related error could occour: https://support.bioconductor.org/p/86110/

Coming back to your setup:
How do the coverage distributions of your samples look like? 
Is it possible that you have rather low coverage and should better adjust the minimum coverage threshold of methRead?

Did you try uniting with default min.per.group (keeping only sites where all samples have coverage)?
Are there any sites are left using this approach?

Best
Alex

Manoswini Dash

unread,
Jan 9, 2024, 11:28:11 AM1/9/24
to methylkit_discussion
Hi
I am recently having the similar issue. 
I have data with more NA values (when I run with min.per.group=1L) , but I checked there are no columns ONLY with NA values. 

I got the same error while running PCASamples() as mentioned by Loryn, and when I run ClusterSamples(), got following error:
Error in hclust(d, HCLUST.METHODS[hclust.method]) : NA/NaN/Inf in foreign function call (arg 10)

Below is my code:
T2T_cov5 <- methRead(as.list(CpGFiles ),
                       sample.id=as.list(sampleNames),
                       pipeline=list(fraction=FALSE, chr.col=1, start.col=2, end.col=2,
                                     coverage.col=5, strand.col=4, freqC.col=6),
                       assembly="T2T",
                       treatment=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
                       context="CpG",
                       mincov=5)
T2T_cov5.filt <- filterByCoverage(T2T_cov5, lo.count=5, lo.perc=NULL, hi.count=NULL, hi.perc=99.9)
T2T_cov5.filt.norm <- normalizeCoverage(T2T_cov5.filt)
T2T_merge = unite(T2T_cov5.filt.norm,min.per.group=1L, destrand=TRUE)
PCASamples(T2T_merge)
clusterSamples(T2T_merge, dist="correlation", method="ward.D2", plot=TRUE)
Reply all
Reply to author
Forward
0 new messages