Number of clusters (clones) linearly increases with number of mutations?

112 views
Skip to first unread message

Pablo Eduardo García Nieto

unread,
Dec 12, 2019, 2:37:53 PM12/12/19
to Pyclone User Group
Hello,

This is a rather simple question. When using pyclone in some data I consistently observe that the number of individual clusters (clones) is positively and linearly associated with the total number of mutations in a sample.

It's worth noting that the VAF distributions across samples is very similar, it's just the number of mutations that changes.
Is this to be expected?

I'm having a hard to time interpret these results, as I know that number of mutations in my samples is also dependent on the sequencing depth.

Thanks,
Pablo

Andrew

unread,
Dec 12, 2019, 4:45:22 PM12/12/19
to Pyclone User Group
Hi Pablo,

There are two possible reasons for this to happen:

1) The expected behavior of the Dirichlet process is to have roughly log(n) clusters, where n is the number of mutations.

2) The sampler may not be mixing well. The default behavior of PyClone is to put all mutations in different clusters initially. However, if there are a large number of mutations the MCMC runs will need quite a while to merge the SNVs into the right number of clusters.

I suspect it case 2) that is the problem here. Two solutions:

1) Add the `--init_method connected` flag when sampling. This initializes the sampler so all SNVs start in the same cluster. For large numbers of mutations this will speed things up, at the risk of getting stuck in a local optima. In practice this does not seem to be a big problem though.

2) Increase the number of iterations of the sampler. The default of 10,000 is probably fine for ~100 SNVs, but you need much larger values for 1,000s.

Best wishes,
Andy

Pablo Eduardo García Nieto

unread,
Dec 12, 2019, 5:14:25 PM12/12/19
to Pyclone User Group
Thanks for the prompt response Andy! I'll implement those suggestions.

Best,
Pablo
Reply all
Reply to author
Forward
0 new messages