Adjusting the gene filtering values, DESEq2 parameters etc.

35 views

Skip to first unread message

Prem

unread,

May 3, 2021, 2:41:26 PM5/3/21

to Omics Playground

This is a really well built GUI for DE. I have tried others and this one appears to allow greater depth and breath of analytical outputs, and especially ease of file uploads and parsing.

Couple of questions though: In many modules like the low expression gene filtering, DESEq2 or limma, very limited parameters are adjustable on the GUI.

1. For example, which module (.R file) is the one for filtering low expressed genes?

2. Is it possible to choose different normalization methods? TMM, etc....

3. Where are the .R files for DESEQ2 and Limma and what parameters can be adjusted.

I understand that the reason is to be conservative on the unattended app, but I guess working with the source code it would be a little helpful to allow flexibility.

Thanks again for a great application,

-Prem

BigOmics Analytics

unread,

May 5, 2021, 12:21:04 AM5/5/21

to Omics Playground

Hi Prem,

1. The low expressed filtering is in compute2-genes.R in the function, around line 235. You will see that it filters out genes that have CPM>1 in less than 2 or less than 1% of total number of samples. At the moment the threshold of CPM=1 is fixed but can be adjust there easily. Whether we should expose it in the GUI is something to think about.

2. Not yet but there is already some initial work being done in the shiny/modules/normalizeCountsModule.R where the user can choose between normalization methods through the GUI at upload. There are still some considerations because internally, we use CPM+quantile for the limma and ttest based tests, where EdgeR and DEseq2 still use their own normalization methods. So if we allow to choose normalization, we need to skip the normalization in the EdgeR and DEseq2 calls.

3. The code for all gene-level DE tests are in ngs-fit.R (yes it's a strange filename...). If you want you can adjust any DESEQ2 parameters there. At the moment the only parameter is the choice between Wald and LRT statistics. You will also see the code is fairly complicated because it also allows for the case of "overdetermined" design (to many DOF/contrasts in the design matrix), where we then have to switch to iteratively solving for the contrasts one by one. What other parameters did you have in mind to adjust?

We try to minimize the number of adjustments and rather stick to the defaults as much as possible. But I am open to suggestions if you think some parameters are really necessary.