Problem running champ.RunCombat (ChAMP) and comBat (SVA)

1,609 views
Skip to first unread message

natalia...@gmail.com

unread,
Nov 21, 2017, 4:28:51 AM11/21/17
to Epigenomics forum
Dear all,

I am trying to correct the batch effects in my EWAS. Using ChAMP package, wih champ.SVD, I observed the higher association is for Sentrix_Position, but when I try to correct with champ.RunCombat or comBat, I get the following error:

With champ.RunCombat:

Error in tcrossprod(t(design), as.matrix(dat)) :
 non-conformable arguments


With comBat:

Error in tcrossprod(t(design), as.matrix(dat)) :
  non-conformable arguments



I also tried to correct for Sentrix_ID, cause in another EWAS I was able to run combat with this batch variable. However, now, with this new sample, I am not able. I get this error:


With champ.RunCombat:

Error in champ.runCombat(beta = myNorm$beta, pd = myLoad$pd,  :
  Sentrix_ID factors is not valid to run Combat, please recheck your dataset.

With comBat:

Error in tcrossprod(t(design), as.matrix(dat)) :
  non-conformable arguments



I will appreciate if someone can help me to overcome this error

Thank you!

Yuan Tian

unread,
Nov 23, 2017, 11:18:06 PM11/23/17
to Epigenomics forum
Hello Natalia:

Thanks for using ChAMP. Combat could run with two conditions, one is for each group in your batch variable, there must be more than 1 sample in it. For example, if your Sentrix_ID contains one value XXX, then only one sample belongs to XXX, it would not work. Another condition is your batch is not confounded with your variable, could you check if all samples from one phenotype happen to be one group in batch? I wrote this in ChAMP vignette, you may find more explanation there.

Best
Yuan Tian

natalia...@gmail.com

unread,
Nov 27, 2017, 6:03:04 AM11/27/17
to Epigenomics forum
Thank you Yuan,

I think the problem is that I have some phenotype which only happens in one batch. So, I suppose I can't run ChAMP.Combat.


Best, 

Natalia

El divendres, 24 novembre de 2017 5:18:06 UTC+1, Yuan Tian va escriure:

RTM

unread,
Apr 16, 2019, 1:49:00 AM4/16/19
to Epigenomics forum
Hello Yuan Tian,

I am analyzing 450K data from TCGA and ran into the same issue. I wanted to use "Sentrix_ID" as my batchname variable but there is 2 Sentrix_ID that have only one sample associated with them. Is there any workaround or should I just remove those 2 samples?

Here is how I ran the combat:

beta.combat <- champ.runCombat(beta=beta,
                               pd
=rnb.set.norm@pheno,
                               batchname
=c("Sentrix_ID"),
                               variablename
="TUMOR_OR_NORMAL")


Thanks a lot!!

Yuan Tian

unread,
Apr 16, 2019, 9:19:06 AM4/16/19
to Epigenomics forum
Hi

I think to remove them is the correct way. Because you don't have any other information/sample to help you to correct the batch effect from these two Sentrix_ID. Or maybe if SVD plot shows Sentrix_ID is actually not that influential for analysis result, you may leave them and don't do Combat correction at all.

Best
Yuan Tian

--
You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RTM

unread,
Apr 16, 2019, 9:37:06 AM4/16/19
to Epigenomics forum
Hello Yuan Tian,

Thanks a lot for your reply and suggestions!

Best wishes.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenom...@googlegroups.com.

MRM

unread,
Apr 8, 2020, 2:58:53 AM4/8/20
to Epigenomics forum
Dear Tian:

I am new to the group. I am also facing a similar situation when using ChAMP to analyze EPICarray data, where there is only one sample for one of the "Slide" values. I have a total of only 17 samples, so, excluding one sample may not be a good idea. SVD showed "Slide" (batchname) a p-value between 0.01 and 0.05. p-value for the Sample_Group is < 0.01. Is a batch correction necessary here. Thanks for your advice.

MRM
To unsubscribe from this group and stop receiving emails from it, send an email to epigenom...@googlegroups.com.

Yuan Tian

unread,
Apr 8, 2020, 4:36:42 PM4/8/20
to Epigenomics forum
Hi MRM:

Sorry if I did not reply in time recently.

I think the Slide indeed show strong effect, so in theory it should be corrected. However,r since you have only one sample there, if Combat is not working, I think maybe one another way could be that you can use linear regression model to identify DMPs, but add these batch information as covariates, this should be a way as well.

Note that champ.DMP() can NOT do this (I really should made it support this…). So you may need to use Lima package, but it should not be complicate to implement.

Recently I need received many emails about how to deal with situation that Combat is not applicable. However this is really not my previous research topic, if I have time in eastern, I will try to see if there are any better batch correction now exist to make ChAMP’s batch correction step better.

Best
Tian
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/epigenomicsforum/b15e93c7-87f5-474a-a80b-0ddc144e66b1%40googlegroups.com.

MRM

unread,
Apr 8, 2020, 6:23:24 PM4/8/20
to Epigenomics forum
Dear Tian:

Thank you very much. I will try to use limma. Thanks.

MRM

On Wednesday, April 8, 2020 at 1:36:42 PM UTC-7, Yuan Tian wrote:
Hi MRM:

Sorry if I did not reply in time recently.

I think the Slide indeed show strong effect, so in theory it should be corrected. However,r since you have only one sample there, if Combat is not working, I think maybe one another way could be that you can use linear regression model to identify DMPs, but add these batch information as covariates, this should be a way as well.

Note that champ.DMP() can NOT do this (I really should made it support this…). So you may need to use Lima package, but it should not be complicate to implement.

Recently I need received many emails about how to deal with situation that Combat is not applicable. However this is really not my previous research topic, if I have time in eastern, I will try to see if there are any better batch correction now exist to make ChAMP’s batch correction step better.

Best
Tian
To unsubscribe from this group and stop receiving emails from it, send an email to epigenom...@googlegroups.com.

Ying Zhao

unread,
Apr 13, 2020, 5:24:36 PM4/13/20
to Epigenomics forum
Hi Yuantian,

When I ran champ.runCombat, it gave me error.

> champ.runCombat(beta.c=myNorm$beta, pd=myLoad1$pd, logitTrans=TRUE)
Error in champ.runCombat(beta.c = myNorm$beta, pd = myLoad1$pd, logitTrans = TRUE) : 
  unused argument (beta.c = myNorm$beta)

And then I remove beta.c=myNorm$beta in the champ.runCombat function as below.

> champ.runCombat(pd=myLoad1$pd, logitTrans=TRUE)        
it seems that there's at least one covariate is confounded with batch, how can I check the confounded covariate(s)?  Thanks in advance. 

Maomao

[===========================]
[<< CHAMP.RUNCOMBAT START >>]
-----------------------------
<< Preparing files for ComBat >>
[Combat correction will be proceed with 734008 probes and 108 samples.]

<< Following Factors in your pd(sample_sheet.csv) could be applied to Combat: >>
<Sample_Plate>(factor):UNC_1634_Plate_2, UNC_1634_Plate_1
<Slide>(factor):203869580078, 203869580083, 203869580088, 203869580089, 203869580090, 203769710097, 203769830097, 203869580106, 203755080107, 203755080108, 203869580108, 203769830109, 203869580060, 203869580067
<Array>(factor):R01C01, R02C01, R03C01, R04C01, R05C01, R06C01, R07C01, R08C01
[champ.runCombat have automatically select ALL factors contain at least two different values from your pd(sample_sheet.csv).]

<< Following Factors in your pd(sample_sheet.csv) can not be corrected: >>
<Sample_Name>
<Sample_Well>
<Sample_Group>
<Pool_ID>
[Factors are ignored because they are conflict with variablename, or they contain ONLY ONE value across all Samples, or some phenotype contains less than 2 Samples.]
As your assigned in batchname parameter: Slide will be corrected by Combat function.

<< Checking confounded status between Slide and Sample_Group >>
--------------------------
Model for Correction is: 
~Sample_Group
<environment: 0x000001ed39bfed58>
Combat can adjust for  107  covariate(s) or covariate level(s)
--------------------------
<< Rank Check Complete, you data is good to proceed. >> ^_^

<< Start Correcting Slide >>
~Sample_Group
<environment: 0x000001ed39bfed58>
Generate mod success. Started to run ComBat, which is quite slow...
Found14batches
Adjusting for107covariate(s) or covariate level(s)
Error in ComBat(dat = beta, batch = batch, mod = mod, par.prior = TRUE) : 
  At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat

Maomao

unread,
Apr 13, 2020, 5:24:36 PM4/13/20
to Epigenomics forum
Hi Yuan Tian,

I use champ.runCombat, but it does not work. 
>champ.runCombat(beta.c=myNorm$beta, pd=myLoad1$pd, logitTrans=TRUE)

Error in champ.runCombat(beta.c = myNorm$beta, pd = myLoad1$pd, logitTrans = TRUE) : 
  unused argument (beta.c = myNorm$beta)


And then I removed beta.c=myNorm$beta in champ.runCombat function. 

> champ.runCombat(pd=myLoad1$pd, logitTrans=TRUE)

It seems that there's at lease one covariate is confounded with batch, is anyway to check which one? 

Thanks in advance.

MAOMAO

To unsubscribe from this group and stop receiving emails from it, send an email to epigenom...@googlegroups.com.

Yuan Tian

unread,
Apr 13, 2020, 6:24:34 PM4/13/20
to epigenom...@googlegroups.com
Hi:

You did not specify the variablename and batch name for your champ.runCombat() function. So it would defaultly take “Slide” as Batch, and “Sample_Group” as Variable.

So you can directly table(pd$Slide, pd$Sample_Group) to see the confounding effect.

Best
Tian
--

You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/epigenomicsforum/b6d15262-ff8c-47f8-bf49-6923e1d63eea%40googlegroups.com.

MAOMAO

unread,
Apr 21, 2020, 2:48:48 PM4/21/20
to Epigenomics forum
Thanks a lot Yuan Tian.

Here are my partial table(pd$Slide, pd$Sample_Group) output, how could I judge? The value equal to 1 would be the sample with the sentrix_ID in the sample sheet.
           L1251 L1321 L1331 L1417 L1151 L1619 L1701 L1714 L1716 L1910 L2017 L2119 L2216 L2327 L2410 L2466 L2618
  203755080107    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  203755080108    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  203769710097    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
  203769830097    0    0    0    0    0    0    0    0    0    1    0    0    0    1    1    1    0
  203769830109    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  203869580060    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
  203869580067    0    0    0    0    0    0    1    1    0    0    0    0    1    0    0    0    0
  203869580078    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  203869580083    0    0    0    0    0    1    0    0    0    0    1    0    0    0    0    0    0


And another question is how could I interpret the SVD plot in the attachment. Thanks a lot.

MAOMAO
To unsubscribe from this group and stop receiving emails from it, send an email to epigenom...@googlegroups.com.
svd.png

MRM

unread,
Apr 28, 2020, 8:28:53 PM4/28/20
to Epigenomics forum
Dear Tian:

I want to extract an average beta value at the gene level for each sample. This could be geometric or arithmetic mean of beta values of all (or only differentially methylated) probes for a gene. Does any of the existing functions such as champ.DMP, DMP.GUI or champ.GSEA/ebGSEA do such a calculation or similar calculation from where I can pull out this intermediate variable. I downloaded the source code of these functions and am looking through those, but I will appreciate any help/hint. The purpose is to be able to plot a heatmap of (normalized) beta values for the samples at the gene level (for several genes together), or to do some correlation between methylation data and gene expression data. I am not sure if existing functions do this.

Also, looking through the source code champ.DMP.R, I am a bit confused re if the comparison is "Group 2 vs Group 1" or "Group 1 vs Group 2". At one place I see:

                tmpname <- paste(Compare[[i]][1],Compare[[i]][2],sep="_to_")
which I interpret as "Group 1 vs Group 2". In building the actual data.frame DMPs[[i]], I see:

            avg <-  cbind(rowMeans(beta[com.idx,which(pheno==Compare[[i]][1])]),rowMeans(beta[com.idx,which(pheno==Compare[[i]][2])]))
            avg <- cbind(avg,avg[,2]-avg[,1])
            colnames(avg) <- c(paste(Compare[[i]],"AVG",sep="_"),"deltaBeta")

Thus, deltaBeta is clearly "average for Group 2" - "average for  Group 1". Please let me know. Thanks

MRM
Reply all
Reply to author
Forward
0 new messages