-------------------------
Hi Nick,
Thanks for your interest! This code has been used quite extensively so there shouldn't be any huge systematic bugs. I wonder it's due to some initialization inconsistence in your code. Also have you sorted your bed file?
Can you use save.image(file='CODEX_debug.rda') after the mapp=getmapp() step and send the file to me? I will look into it. My email is yuc...@wharton.upenn.edu. If it is too large, you can share that with me via Dropbox: yj...@cornell.edu.
Cheers,
Yuchao
On Oct 28, 2015, at 4:33 AM, Jagiella, Nick <njag...@definiens.com> wrote:
Dear Yuchao Jiang ,I would like to use the CODEX R package to do some whole exon sequencing data analysis. The installation and toy data example worked flawlessly. But applying the CODEX pipeline to my own BAM and BED files always ends up with an error and I can’t really figure out how to solve it.I already submitted a formal request to the Bioconductor support website:The error seems to occur during the quality control step:qcObj <- qc(Y, sampname, chr, ref, mapp, gc, cov_thresh = c(20, 4000), length_thresh = c(20, 2000), mapp_thresh = 0.9, gc_thresh = c(20, 80))Excluded NA exons due to extreme coverage.Excluded 0 exons due to extreme exonic length.Excluded 0 exons due to extreme mappability.Excluded 0 exons due to extreme GC content.After taking union, excluded NA out of 8 exons in QC.Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :subscript contains NAsI would be very grateful about any idea, what could have caused the problem. The problem occurs in R-Studio (R version 3.2) as well as R (version 3.3) using either CODEX version 1.2 or 1.3.Yours sincerely,Nick Jagiella
<image001.jpg> <image002.png>
Sitz der Gesellschaft/Registered Office: Munich, Germany; Vorstand/Executive Board: Thomas Heydler (Vorsitzender/CEO), Prof. Dr. Gerd Binnig, Christiaan Neeleman, Dr. Markus Rinecker; Vorsitzende des Aufsichtsrats/Chairwoman of the Supervisory Board: Dr. Bahija Jallal; Registergericht/Commercial Register München HRB 133088
Hi Nick,
CODEX is designed for whole-exome sequencing. You need to process the entire chromosome all at once, unless if yours is targeted sequencing (but your sequencing depth isn’t that high). If it’s targeted sequencing, you can use an adapted version of CODEX:
CODEX for targeted sequencing:
We've adapted CODEX for targeted sequencing. Refer to codes attached (need to source segment_targeted.R for gene based segmentation):
The error you saw is because CODEX filters out samples with < 2000 total reads per chromosome (for whole exome sequencing). And yours are 383 227 and 365 respectively and CODEX retreats these samples and capture failure. That’s why you see the NA and thus the errors.
Also, do you only have 3 samples? CODEX adopts a normalization procedure that estimates the GC content bias, the exon amplification and targeting efficiency, and latent biases and artifacts across all samples. Three samples aren’t enough to estimate these biases. Normally we recommend at least ~20 samples as input for CODEX.
Cheers,
Yuchao
From: Jagiella, Nick [mailto:njag...@definiens.com]
Sent: Wednesday, October 28, 2015 12:58 PM
To: Jiang, Yuchao <yuc...@wharton.upenn.edu>
Subject: RE: CODEX pipeline fails with Error: subscript contains NAs
Hi Yuchao,
Thank you for your fast answer! Attached you can find the CODEX_debug.rda file which I produced following your indications and the BED file just in case it could help somehow.
I didn’t sort the BED file. It is just a subset of the exons found in the following region:
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr9%3A5450503-5470567
One difference I recognized between the WES example and mine was, that in the example chromosome was indicated by
chr <- 22
while I needed to use
chr <- “chr9”
to make it run.
If you have any idea where the issue could be, I would be very grateful!
Cheers,
Nick
Dear Yuchao Jiang,