Pyclone input file

lyanliu

unread,

May 12, 2017, 8:49:23 AM5/12/17

to Pyclone User Group

Hi,Andy
Recently I am analyzing 13 patients' exome-sequencing data. Each has two or three tumor samples. My method was as follows：
1. use Mutect to call somatic SNVs,
2. use Sequenza to calculate copy number and estimate tumour purity ,then use sequenza2PyClone module to generate pyclone input files
3. I filtered the input files based on mutect results , and selected out the shared mutations in two or three tumor samples ,
Finally I only got 5-20 mutations in each patient for PyClone. Is it normal or meaningful? What steps do I need to change?
I also tried to analysis each sample according to the file loci.tsv to draw a scatter plot , like the result in the attachment. Is it proper to compare subclone situation in two or three samples of each patient ?

Tanay Biswas

unread,

Jun 21, 2020, 10:53:34 PM6/21/20

to Pyclone User Group

Hi Iyanliu

Could you please elaborate how you used sequenza2PyClone module cz I also want to do that but I'm not finding any arguments of that. Please help me in using that module.

Thanks

Andrew

unread,

Sep 2, 2020, 5:35:41 PM9/2/20

to Pyclone User Group

This is a common point of confusion. You should not use only mutations in the intersection of the samples. As you discovered this results in few mutations, and they are likely all clonal.

The correct approach is to take the union of all mutations found across samples. You will then need to extract read count data from the BAM files for those positions. I usually use pysam for this, but there are lots of tools. You may find that in many samples there are no or very few alt reads for the mutation detected in another sample, but that is fine. It just means the clone with that mutation is either not present or very rare in the sample.

逢源吴

unread,

Oct 27, 2021, 11:17:57 PM10/27/21

to Pyclone User Group

Hi Andrew

After some filtering steps of GATK toolkit, mutations will be annotated as "PASS", "contamination", "weak_evidence" or other combinations of such filters in vcf files. Could you please describe mutations with what kind of filter will be used to take the union?

For instance, if only using “PASS” mutations to take the union. A mutation is "PASS" in sample A, while in sample B the mutation is annotated as "contamination" which means some reads support the mutation are cross contamination. Will this be tricky or such situation is extremely rare？

Thank you！

Andrew

unread,

Oct 28, 2021, 3:12:47 PM10/28/21

to Pyclone User Group

My usual strategy for multiple samples is to call SNVs in each sample and filter for the high quality i.e. "PASS" ones. Then I take the union of all good SNVs across the samples and extract allele specific read counts with a custom script for all SNVs in the set across all samples.