Question Regarding Cell Annotations and Gene Expression Matrix Input to InferCNV

347 views
Skip to first unread message

Alaine

unread,
Jul 30, 2020, 12:33:17 PM7/30/20
to Trinity_CTAT_users

Hi there,

 

I had a question regarding generating and using annotation files in InferCNV. 


In our study, we are interested in using InferCNV to detect malignant cells (separate malignant and non-malignant cells). We do not know the cell types in our sample a priori and we are not interested in identifying the different cell types within the dataset, we simply want to classify malignant and non-malignant cells based on inferred CNV patterns. However, in order to use InferCNV, we need to generate an annotation file. How do you recommend we generate these cell annotations in an unbiased way? 


Additionally, I had a question regarding the gene expression matrix input for InferCNV. It says that we should use a matrix of raw counts, however some of my datasets are in CPM or TPM. Can I apply inferCNV to datasets in TPM or CPM? 

 

Thank you,

Alaine

caroline hochheuser

unread,
Oct 1, 2020, 6:38:50 AM10/1/20
to Trinity_CTAT_users
Hi everyone,

I have the same situation where I cannot generate such annotation file because we are aiming to validate our gating strategy and therefore i have one plate that we sorted in an unbiased way so I don't have any information which celltype is in my plate (except for that they are all non-hematopoietic (CD45-) cells). From this plate we want to identify in which wells we have malignant cells.

@Alaine, if you have received an answer or figured out a solution, I'd be interested in hearing it ;-) 

Thanks!

All the best
Caroline 

cgeo...@broadinstitute.org

unread,
Oct 5, 2020, 2:25:22 PM10/5/20
to Trinity_CTAT_users
Hi,

The main issue in the cases you are describing is not really about defining cell types, but about defining a set of cells you know are non-malignant and can be used as references. Infercnv uses the reference cells to define the base expression level of 2 copies of each gene/chromosome (in different cell types if available), to be able to tell if what is seen in the other cells (the observation set) has a normal expression level, or is lower (potential loss of copies), or higher (potential gain of copies).
You can run infercnv without defining references, in which case it will use the average of all cells available as the base level expression to compare to. However that means that when you see signal, you can only tell that there is a difference between 2 (or more) groups of cells, not which one is non-malignant and which one is malignant (if there even are any non-malignant cells). The most common profile will be the closest to the average, so if you have more malignant cells than non-malignant cells, the signal will highlight non-malignant cells. It also means that if all cells have the same CNV, it will not be detectable.

If you want to do clustering of cells without specifically identifying their type prior to running infercnv, you can try using Louvain clustering in scanpy, which seems to be doing a good job. Infercnv also has options to split references/observations in a given number of groups based on hierachical clustering.

You can apply infercnv to TPM/CPM, but raw counts are preferred, specifically if you plan to run the HMM predictions as the model uses the raw counts to simulate what different cnvs should look like.

Regards,
Christophe.
Reply all
Reply to author
Forward
0 new messages