PyClone Pipeline

490 views

Skip to first unread message

Қwat Еrnar

unread,

Dec 8, 2014, 2:58:37 AM12/8/14

to pyclone-u...@googlegroups.com

Dear PyCloners,

Hi,

I thank you in advance for your support.

I am new to NGS analysis. I started learning using PyClone recently. I have been working with GATK to call variants and preparing the .tsv files for runnig the PyClone. However, I feel as if I am doing more steps than necessary or I can do a better job in other ways. So, I began to wonder if there are better ways to prepare the .tsv files necessary for running the PyClone.

Could anyone share with me: how you prepare the .tsv files; what pipelines you use for running the PyClone; and/or what tools you use in your process?

I appreciate your help.

Thank You,

Kwat

Andrew

unread,

Dec 10, 2014, 6:36:08 PM12/10/14

to pyclone-u...@googlegroups.com

Hi Kwat,

First off, I have no experience with GATK so I can't comment on the best steps for that workflow. Other users may be able to chime in there.

There is quite a bit of variability in data types so it is hard to describe a single workflow which is useful. Below are a few tools and libraries I use. Keep in mind you could likely choose any of a dozen others for each step and get very similar results.

Mutation Calling: Strelka, MutationSeq

Deep sequence alignment: I use bwa with the `aln` and `sampe` commands. I've found the `mem` algorithm to be a bit aggressive. I align to the entire genome not just the target amplicons.

Copy number array: I mostly use OncoSNP but have also used PICNIC and ASCAT to check the robustness of results.

Copy number WGSS: Currently I am using an in house tool, but I have used TITAN and OncoSNP-Seq.

Collating the data: I typically write custom Python scripts to do this. Python has good builtin support for csv (and tsv obviously). The "pandas" library is invaluable for working with table data and the "PyYAML" module is good for writing the '.yaml' config files.

Pipeline: I write my own pipelines using the "ruffus" library.

Our basic workflow is as follows:

WGSS

Call mutations
Estimate copy number and tumour content (may use array if we did exome sequencing)

Design PCR primers to deep sequence mutations.
Use MiSeq to deepseq the mutations

Align with BWA `aln` and `sampe` to the whole genome
Extract counts using a custom script
Perform the binomial exact test to determine if variant is actually present
Remove any sites which are germline and wildtype

Join the copy number data and deepseq counts with a custom Python script
Autogenerate a yaml config file using a custom Python script.
Run PyClone 3+ times with different random seeds to check convergence

Cheers,

Andy

Reply all

Reply to author

Forward

0 new messages