On Jul 10, 2018, at 2:00 PM, Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu> wrote:
Hey Yuchao,For FALCON-X WES het input variants is it proper to restrict to the exon targets? Im not sure how tolerant the method is to False Positive input. I could put the cohort through a robust variant filtering or just take the raw variants that pass minimum call quality.Quality over Quantity? Thanks.-BradFrom: Jiang, Yuchao <yuc...@email.unc.edu>
Sent: Monday, July 9, 2018 2:35 PM
To: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Cc: Gene Urrutia <gene.u...@gmail.com>
Subject: Re: Running CODEX2 Somatic WESThe reason I combine all chromosomes is because N can only be unbiased estimated on the genome wide scale (especially for cancer with abrupt copy number changes) and its a bit confusing to go back and forth between genome wide and chr specific. I don’t think getting coverage for the entire genome all at once is too slow although I haven’t checked too many WES datasets. Can you let me know what your experiences are? And yes please feel free to fork and play around.Yuchao
Sent from my iPhone
On Jul 9, 2018, at 2:00 PM, Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu> wrote:Ah thank you.What was the reasoning for removing the chr argument? I found value in being able generate coverage values for all chromosomes for all samples in parallel (and writing to file to reload). Big time saver. But if this caused issues, maybe Ill try and fork CODEX2 and play around with other parallel options.-BradFrom: Jiang, Yuchao <yuc...@email.unc.edu>
Sent: Monday, July 9, 2018 1:47 PM
To: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Cc: Gene Urrutia <gene.u...@gmail.com>
Subject: Re: Running CODEX2 Somatic WESCan you update from GitHub? The Chr argument is removed in the most recent version
Sent from my iPhone
On Jul 9, 2018, at 1:45 PM, Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu> wrote:Thanks!The documentation for getcoverage still shows it requiring a chr argument. Is this truly the case?-BradFrom: Jiang, Yuchao <yuc...@email.unc.edu>
Sent: Monday, July 9, 2018 1:21 PM
To: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Cc: Gene Urrutia <gene.u...@gmail.com>
Subject: Re: Running CODEX2 Somatic WESHi Brad,I made some updates for CODEX2 last week. As of now, for whole-exome sequencing, you should get coverage, perform qc, calculate N from the entire genome. For normalization, you can do that in a genome-wide fashion on iterate through each chromosome (the latter is faster). For segmentation, it should be ran on each chromosome separately. The most updated demo is here http://htmlpreview.github.io/?https://github.com/yuchaojiang/CODEX2/blob/master/demo/CODEX2.htmlAbove is for whole-exome sequencing, if you have targeted sequencing, everything is on the genome-wide scale except for segmentation, which is carried out within each gene separately.Hope this clarifies,Yuchao
On Jul 9, 2018, at 12:23 PM, Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu> wrote:Greetings,I am still making progress. I am back at the CODEX2 portion, working to break up the work as much as possible. At what point must I stop working individual chromosomes? I split up calculating coverage and running the qc, but trying to put together the individual qc results raises some flags, so perhaps Im doing things wrong.At first I figured I could just concatenate the Y_qc matrices, but its possible that sampleA could be removed for chr2 but not chr1 and I don’t know what to do about that so that so that it runs properly (If I executed qc() on all chr data, what would have happened?).Can I continue to run the rest of CODEX2 on individual chromosomes? It appears like I could calculate N, normalize, and make calls and segments a chromosome at a time? Or is it the opposite, can I only do the depth calculations and I need to qc and everything after using all chromosome data together? Thank you!-BradFrom: Gene Urrutia <gene.u...@gmail.com>
Sent: Monday, July 2, 2018 9:14 AM
To: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Cc: Jiang, Yuchao <yuc...@email.unc.edu>
Subject: Re: Running CODEX2 Somatic WESHi Brad, Glad you are having success with MARATHON.
Falcon-X needs across sample normalization and the input is allele-specific read counts at germline heterozygous loci, called across all samples. Therefore, the vcf should have the same entries (SNPs) for all samples.
HaplotypeCaller is fine since we are calling germline heterozygous loci to begin with.
Thanks, and let me know if you have any other questions,
GeneOn Thu, Jun 28, 2018 at 1:21 PM, Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu> wrote:Hey Yuchao,I got codex2 working and I am moving right into MARATHON (I can send you my pipeline code when I am finished if you would like).Would you be able to answer some usage questions about the Falcon-x portions? The example data you present is very straight forward, but I am trying to expand them to fit my current data.Ok so IF I take the snp locations from the paired vcf files, the sites called will not by uniform across all samples when I try an put them together. Now I can make a multisample vcf for all tumors and normals but the proper thing to do in that case would be to assign no calls, ./., to missing positions for each sample. And Im not sure how any of your tools and functions would handle presumably NAs. So then the other option would be to run HaplotypeCaller jointly on all of the tumors and normal together (or some variation where I try to generate a master list of postions/snps and genotype each sample individually). This appeared to be implied in the example. I do not consider HaplotypeCaller to be a very good capturer or somatic mutations, but it would probably do the trick and at least all sites would have a genotype. And perhaps the falcon-x input does not have to be very somatic focused?Thanks.-BradFrom: Jiang, Yuchao <yuc...@email.unc.edu>
Sent: Tuesday, May 8, 2018 9:58 AM
To: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Cc: Nathanson MD, Katherine L <knat...@upenn.edu>; gene.urrutia (gene.u...@gmail.com) <gene.u...@gmail.com>
Subject: RE: Running CODEX2 Somatic WESHi Brad,Hope all is well in Philly! For your experimental design, CODEX2 is not an ideal method because you only have 2 normal samples and the highest K will be 2. Also the total number of samples might not be sufficient either (usually >20). We have recently published a pipeline MARATHONhttps://github.com/yuchaojiang/MARATHON/, which tells user when to use which pipeline under different designs. I am cc’ing Gene here who can further assist you if you have questions about MARATHON. For your case, since CODEX2 cannot be applied, you probably can retain copy number signals by comparing tumor to normal after adjusting for sequencing depth.For CODEX2, I will have a final release soon after the paper is accepted. Currently, for depth of coverage, it is computed chr by chr and for normalization, you can either perform it genome-wide or chr by chr.YuchaoFrom: Wubbenhorst, Bradley <bw...@pennmedicine.upenn.edu>
Sent: Friday, May 4, 2018 10:30 AM
To: Jiang, Yuchao <yuc...@email.unc.edu>
Subject: Running CODEX2 Somatic WESHey Yuchao! I hope you and your family have been doing well!Im getting my hands back into CODEX2 work, trying to shape it into my somatic pipeline. I cant find any of my old Tumor-Normal Codex code so I just have a few questions on the new stuff.With a Tumor-Normal WES situation, do I process all of the data together or in pairs? In one project I have 16 Tumors, but they are matched to one of two normal ( I have 16 tumor bams and 2 normal bams); so Im not sure if things will work.And then I run the entire codex pipeline for each chromosome correct? Im just reading through all of the code and comments and see that the coverage needs to be calculated for each chromosome, but then I wanted to double check that Im not supposed to be like building an array of coverage objects or something. Thanks.-Brad
Hey Yuchao,
Could you explain to me a bit more about the usage of CODEX2 with negative control regions?
“empirically identify common CNV regions by a first-pass CODEX run.” Does this mean that I can always determine negative control regions in WES studies? Can I use this for tumor/normal WES too somehow or is this for germline studies?
Your example shows indices, but how can I determine them from my data? Or is this meant to be only for certain study designs?
Brad - I have a few deadlines this week. Will get back to you next week.
Thanks Yuchao!
To ask one more question too, I just successfully did some codex2 targeted. I followed your examples near exactly, but since they were germline should my segmentation mode still have been ‘fraction’? I left it as such, but I know that normally I would switch to ‘integer’.
Sorry, Im starting to ask you frivolous questions. I figured out everything with the fractional and integer. Segmentation is something I should know very well. What I do not yet grasp is how should I interpret the mBIC value in the results. My optimal K was 1 and Ive read that the AIC and BIC values should be informative, but what their values mean is not something I currently understand.
Is high or low mBIC bad?
Thanks. No Rush, I know you have deadlines.
-Brad
--
You received this message because you are subscribed to the Google Groups "canopy_phylogeny" group.
To unsubscribe from this group and stop receiving emails from it, send an email to canopy_phyloge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/canopy_phylogeny/BN1PR04MB375F6C28872343764E9874A9F570%40BN1PR04MB375.namprd04.prod.outlook.com.
For more options, visit https://groups.google.com/d/optout.
Thank you. What Im still not clear on is under what circumstances would I use AIC over BIC? And for a given segment, is the mBIC value informative about anything? Is a higher value better?
-Brad
Hi Yuchao,
Can you provide me with a sanity check? Attached is a choiceofK plot, this was a tumore_normal study. All of the plots look like this, all optK(’s) were determined to be 1. Im not as familiar looking at plots of BIC, but it does not look correct. Thank you.
<tumor_normal_1_ns_choiceofK.pdf>