results file

158 views

Skip to first unread message

Lisa Wei

unread,

Apr 29, 2019, 8:35:21 PM4/29/19

to Pyclone User Group

Hi Andy,

Can you clarify for me what the meaning of each column in the cluster.tsv file in the output "tables" folder? What does "size" and "mean" indicate here exactly?

I'm asking because the means belonging to a single cluster_id don't add up to 1. Are they supposed to be proportions?

Thank you!

Lisa

Andrew

unread,

Apr 29, 2019, 11:40:43 PM4/29/19

to Pyclone User Group

Hi Lisa,

1) This table reports summary information for each cluster in each sample. The "size" column reports how big the column is. This is the same in all samples, it is just repeated because the table is in tidy format.

2) The "mean" column indicates the posterior mean of the cluster in a sample. These definitely will not add to 1 across samples. For example mutations which are clonal will be at approximately 1 in each sample. In general the cellular prevalence values in PyClone don't sum to 1 across clusters within a sample either, because they represent the proportion of cells with a mutation. This is not the same thing as the size of clones within a sample which would sum to 1. Cellular prevalence is in fact the sum over the size of all clones with the mutations in a cluster.

To get to clonal prevalence estimates you need to use different models such as BayClone or PhyloWGS. These use a different structure for the model, and impose different assumptions.

Cheers,

Andy

Reply all

Reply to author

Forward

0 new messages