Paired Statistics

38 views
Skip to first unread message

Philipp Brunnbauer

unread,
Mar 18, 2025, 12:00:11 PMMar 18
to Omics Playground
Hello, 

I would like to inquire as to the ability of BigOmics Playground to analyse paired data. I have not found the option to do so, although it seems obvious that it should be doable? Maybe I am missing something? Also, how would i change the metadata to indicate the replicates/pairs? 

If I analyse my dataset with unpaired statistics, comparing group means, basically all adjusted p-values are set to 1 after FDR correction, which is quite odd.

Best,
Philipp

Ivo Kwee

unread,
Mar 21, 2025, 12:01:17 PMMar 21
to Omics Playground
Hi Phillip

Paired testing is similar (or equal) as introducing an extra subject covariate in the linear model. However, the way we prefer do it in Omics Playground is to correct for the pairing effect using limma, ComBat or one of the unsupervised bath correction methods (NPM, SVA or RUV). Batch correction is closely related to adding subject covariate in ANOVA. It also uses linear regression, but it explicitly subtracts the pair/subject induced variation and gives you the residuals as a corrected matrix. 

To correct for the pairing effect using limma or ComBat, you need to add a column in your sample information (samples.csv) file that denotes the subject/patient ID. So if the first two samples come from patient A, and sample 3-4 from patient B, you would add a colum:  c(A,A,B,B). Be sure that conditions (e.g. treated and control) exists is all subjects.  If you use NPM, SVA or RUV, they should 'automagically' remove the pairing effect for you.  It is not 'odd' to get no significance if you do not correct or do explicit paired testing. In many real world data from patients there is considerable patient specific variation even in the baseline and correction is needed. 

Our new paper on NPM is just published. It introduces 'quasi pairing' even if you do not have the pairing information. Of course it also works if you do have the pairing information. https://academic.oup.com/bioinformatics/article/41/3/btaf084/8042340

Best

Ivo

Philipp Brunnbauer

unread,
Apr 3, 2025, 7:34:24 AMApr 3
to Omics Playground
Hey Ivo,

many thanks for your answer. Just so I understand correctly:

I have added a 3rd column to the sample.csv file now, which indicates the replicate (patient), so my file looks like this

label, timepoint(condition/phenotype), replicate

Is this correct?

Further, I have used the replicate factor for combat batch correction now, was that what you meant? I also have some other questions on the actual specifics of my dataset but they would be confidential. Would it be possible to switch to a private chat?

Reply all
Reply to author
Forward
0 new messages