Hi Ingrid.
On Wed, Mar 20, 2013 at 4:45 PM, Ingrid Lönnstedt
<
ingrid.l...@gmail.com> wrote:
>
> Hi!
>
> I have segmented 56 SNP 6 tumour samples with segmentByNonPairedPSCBS(),
> and like to call allelic balance segments with callAB() as required, I
> understand, before I can call neutral TCNs with callNTCN().
First, I just want to make it clear (to you and others), that calling
segments to have copy-neutral TCNs ("NTCN") or not is experimental and
under development and was not part of the Paired PSCBS (2011) paper
(which only focused on calling AB and LOH). I've updated the package
to clarified this in the vignette. There I also give a little bit
more explanation on what parameters are estimated in order to call
NTCN, 'kappa' and 'deltaCN', where 'kappa' is tricky to estimate
especially in tumors where nothing much goes on.
Second, likewise, Non-paired PSCBS should also be considered
experimental (although much less so than calling NTCN). It's only the
Paired PSCBS method that is well tested and published.
Third, some heads up (I know some of you are waiting for this): We're
improving the NTCN calling as part of our in-house projects, and some
potential improvements have already been discovered and may eventually
become the default. We'll make it clear in the vignette when we have
a "stable" solution. (*) Yes, it's 2013 and calling PSCNs in tumors
is still not straightforward - it's not easy to find an omnibus method
that works in all cases - many tried before us.
Fourth, you can install the latest devel version of PSCBS as (recommended):
source("
http://aroma-project.org/hbLite.R");
hbInstall("PSCBS");
> For the first
> tumour sample, both callAB() and callNTCN() seem to work fine, but then on
> the second sample I get errors with callAB(), see below.
See comment after the error.
> The plotTracks()
> image looks pretty much the same between sample one and two, and I have no
> reason to believe that sample two has a major problem in the segmentation. 4
> of the 56 samples get very noisy segmentations, with lots of short regions
> scattered (has anybody seen that before?)
Yes, over-segmentation is common in CN analysis. I'd argue that you
actually want a bit of over-segmentation and post-cleanup and/or
calling method to handle that. However, I assume you mean extreme
over-segmentation. There is no simple solution for that. The ideal
solution is to do better normalization of the locus-level data
("whatever method that would be") before you pass it on to the
segmentation method.
> but that is not the case with
> sample one or two. Any suggestions to how I might assess this error?
>
> Best regards,
> Ingrid
>
> dsNList <- exportTotalAndFracB(cesN, verbose=verbose);
> dataN = extractPSCNArray(dsNList$total)
>
> yT = dataN[,'total',samp]
> CT = yT/median(yT)
Using median(yT) is not correct and will result in unnecessary large
amount of noise in the TCN signals.
Dropping index 'T'. Consider a particular sample 'i' (e.g. your tumor
of interest) and SNP 'j'. When calculating the relative TCN for this
sample and tumor, C_{i,j}, you need to rescale the observed SNP signal
y_{i,j} relative to the (unknown) mean SNP signal for this SNP,
mu_{j}, i.e. C_{i,j} = 2 * y_{i,j}/mu_{j}. (*) Multiply by "2" for
diploid genomes (GWS6 = human).
When you have matched normal ('N'), you often get very good
signal-to-noise ratios if you use mu*_{j} = y_{N,j}. Thus, for the
tumor-normal pair (T,N) you would calculate the relative TCNs for the
tumor as C_{T,j} = 2 * y_{T,j}/y_{N,j}.
When you don't have matched normal, you need to use another estimate
for mu_{j}. It's common to use the robust average of all the samples
in your data set, i.e. mu*_{j} = median_{i} (y_{i,j}).
FYI, above you are using mu*_{j} = median_{j} (y_{T,j}) - note
averaging across SNPs in one sample and not across samples in one SNP
- that is a big difference!
Thus, in your case you need to use the latter. To calculate this
robust average in the aroma framework, do:
yR <- getAverageFile(dsNList$total);
and then do:
CT = 2 * yT/yR
> betaT = dataN[,'fracB',samp]
> ugp = getAromaUgpFile(dsNList$total)
> chromosome = ugp[,1,drop = T]
> x = ugp[, 2, drop = T]
> df = data.frame(chromosome = chromosome, x=x, CT=CT, betaT = betaT)
> df = dropSegmentationOutliers(df)
> #This gives warnings:
> #24: In DNAcopy::CNA(genomdat = yKK, chrom = chromosomeKK, ... :
> #array has repeated maploc positions
> gaps = findLargeGaps(df, minLength = 1e+06)
> knownSegments = gapsToSegments(gaps)
>
> fit = segmentByNonPairedPSCBS(df, knownSegments = knownSegments,
> avgDH="median",
> tauA = .33, seed =
> 48879, verbose = -10)
> #This gives warnings that there are multipe maploc positions
> fit = callAB(fit, verbose = -10)
>
> Calling segments of allelic balance from one-sided DH bootstrap confidence
> intervals...
> delta (offset adjusting for bias in DH): 0.211106073856354
> alpha (CI quantile; significance level): 0.05
> List of 7
> $ nbrOfDHs : int 70
> $ dhNbrOfLoci : int 70
> $ mu : num 0.183
> $ dhMean : num 0.203
> $ dMu : num -0.02
> $ abs(dMu) : num 0.02
> $ range(x[units]): num [1:2] 990417 2792801
> Error in bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn,
> :
> INTERNAL ERROR: Incorrect DH mean!
> Calling segments of allelic balance from one-sided DH bootstrap
> confidence intervals...done
This is a internal sanity check throwing an error. As a start, I
recommend to retry with (i) a proper CT calculate, and (ii) after
updated PSCBS.
Hope this helps
Henrik
>
>
> >
> > head(getSegments(fit))
> chromosome tcnId dhId tcnStart tcnEnd tcnNbrOfLoci tcnMean
> tcnNbrOfSNPs tcnNbrOfHets dhNbrOfLoci dhStart dhEnd
> 1 1 1 1 61736.0 811403.5 73 1.4735545
> 15 2 2 61736.0 811403.5
> 2 1 2 1 811403.5 2821394.0 527 0.9201369
> 183 70 70 811403.5 2821394.0
> 3 1 3 1 2821394.0 2827143.5 2 5.0273015
> 2 0 0 2821394.0 2827143.5
> 4 1 4 1 2827143.5 11694138.0 5531 0.9780667
> 2940 983
983 2827143.5 11694138.0
> 5 1 5 1 11694138.0 11709225.0 5 3.3949294
> 5 3 3 11694138.0 11709225.0
> 6 1 6 1 11709225.0 16923050.0 2918 1.0072282
> 1485 550 550 11709225.0 16923050.0
> dhMean c1Mean c2Mean
> 1 0.1695309 0.6118708 0.8616838
> 2 0.2033105 0.3665317 0.5536052
> 3 NA NA NA
> 4 0.2033807 0.3895734 0.5884933
> 5 0.1465555 1.4486918 1.9462376
> 6 0.2036166 0.4010699 0.6061583
>
>
> > traceback()
> 7: stop("INTERNAL ERROR: Incorrect DH mean!")
> 6: bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn,
> ..., verbose = less(verbose, 50))
> 5: bootstrapTCNandDHByRegion(fit, statsFcn = statsFcn, ..., verbose =
> less(verbose,
> 50))
> 4: callAllelicBalanceByDH.PairedPSCBS(fit, ...)
> 3: callAllelicBalanceByDH(fit, ...)
> 2: callAB.PairedPSCBS(fit, verbose = -10)
> 1: callAB(fit, verbose = -10)
>
>
>
>
> > sessionInfo()
>
> R version 2.15.2 Patched (2013-02-09 r61879)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C
> LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] Hmisc_3.10-1 survival_2.37-4 DNAcopy_1.32.0
> PSCBS_0.30.0 aroma.affymetrix_2.8.0
> [6] affxparser_1.30.2 aroma.apd_0.2.3 R.huge_0.4.1
> aroma.light_1.28.0 aroma.core_2.8.0
> [11] matrixStats_0.6.2 R.rsp_0.8.2 R.devices_2.1.3
> R.cache_0.6.5 R.filesets_2.0.0
> [16] crlmm_1.16.9 preprocessCore_1.20.0 oligoClasses_1.20.0
> BiocGenerics_0.4.0 R.utils_1.19.5
> [21] R.oo_1.11.7 R.methodsS3_1.4.2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.26.0 annotate_1.36.0 AnnotationDbi_1.20.5
> Biobase_2.18.0 BiocInstaller_1.8.3 Biostrings_2.26.3
> [7] bit_1.1-9 cluster_1.14.3 codetools_0.2-8
> DBI_0.2-5 digest_0.6.3 ellipse_0.3-7
> [13] ff_2.2-10 foreach_1.4.0 genefilter_1.40.0
> GenomicRanges_1.10.7 grid_2.15.2 IRanges_1.16.6
> [19] iterators_1.0.6 lattice_0.20-13 Matrix_1.0-11
> mvtnorm_0.9-9994 parallel_2.15.2 RcppEigen_0.3.1.2.1
> [25] RSQLite_0.11.2 stats4_2.15.2 tools_2.15.2
> XML_3.95-0.1 xtable_1.7-1 zlibbioc_1.4.0
>
>
> >
>
>
>
> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the
> latest version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website
http://www.aroma-project.org/.
> To post to this group, send email to
aroma-af...@googlegroups.com
> To unsubscribe and other options, go to
>
http://www.aroma-project.org/forum/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
aroma-affymetr...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>