Re: Correspondence: Some important questions regarding CANOPY

63 views

Skip to first unread message

Jiang, Yuchao

unread,

Nov 22, 2017, 11:27:35 AM11/22/17

to Zhang, Nancy R, Aisha Yousaf, canopy_p...@googlegroups.com

I think I’ve replied most of your questions, via a different thread. See below again.

Yuchao

On Nov 22, 2017, at 9:32 AM, Zhang, Nancy R <n...@wharton.upenn.edu> wrote:

Dear Aisha,

I actually don’t remember these details, but Yuchao, who is the first author, might. Yuchao, please see AISHA’s email below.

Nancy

From: Aisha Yousaf [mailto:aisha...@live.com]
Sent: Monday, November 20, 2017 10:42 PM
To: Zhang, Nancy R <n...@wharton.upenn.edu>
Subject: Correspondence: Some important questions regarding CANOPY

Hi Nancy,

I hope you are doing good. I read your article entitled "Assessing intra-tumor heterogeneity and tracking longitudinal and spatial clonal evolution by next-generation sequencing". I am interested in applying that method ('CANOPY') to reproduce the results for MDA-MB-231 example. I followed the entire protocol for SNV calling but I am having trouble in CNA profiling for that particular example, having tumor-only samples (no normal available.) I read the supplementary also, but the things are not quite clear to me.

I have some questions regarding that:

How did you calculated BAF on SCP samples and on which file format? I couldn't find any standardized method for calculating BAF.

BAF stands for B allele frequency at heterozygous loci, which is used to infer copy number together with depth ratio. BAF can be calculated from VCF files. VCF can be obtained from SNP callers, such as GATK.

How did you apply HMM on the input BAFs. Does it involve any R package or software?

We implement our own HMM to infer copy number states from BAFs. The emission and transition probabilities are in the supplementary tables. After segmentation using BAFs, the actual copy numbers are inferred/manually adjusted based on depth of coverage ratios. This whole analysis arose because we don’t have paired normal samples to directly infer allele-specific copy numbers.

In case of SNVs, after following the entire filtering procedure, as mentioned in your article, I am left with 216 variants associated with 4 different genes. How did you come up with the variants associated with 4 genes only. Which process did you apply?

I’ve replied to this before — you need to functionally annotate these mutations and we select for those that are highly deleterious.

Furthermore, two of the genes pinpointed in your study (i.e., BRAF and KRAS) are associated with dbSNP id. Don't we filter out those variants which have dbSNP ID, because they are usually thought to be germline mutations. Or I am missing on some concept?

These mutations are highly deleterious. Furthermore and more importantly, they have been validated by deep sequencing by a different study which we cited, using deep sequencing. See Jacob et al. reference 37.

I will be really grateful to your response in this regard. I am stuck on this problem from quite sometime.

Looking forward to hear from you soon.

Best wishes,

AISHA

Reply all

Reply to author

Forward

0 new messages