Re: [aroma.affymetrix] callAB gives Error in bootstrapTCNandDHByRegion.PairedPSCBS

52 views
Skip to first unread message

Henrik Bengtsson

unread,
Mar 22, 2013, 2:03:50 PM3/22/13
to aroma-af...@googlegroups.com
Hi Ingrid.

On Wed, Mar 20, 2013 at 4:45 PM, Ingrid Lönnstedt
<ingrid.l...@gmail.com> wrote:
>
> Hi!
>
> I have segmented 56 SNP 6 tumour samples with segmentByNonPairedPSCBS(),
> and like to call allelic balance segments with callAB() as required, I
> understand, before I can call neutral TCNs with callNTCN().

First, I just want to make it clear (to you and others), that calling
segments to have copy-neutral TCNs ("NTCN") or not is experimental and
under development and was not part of the Paired PSCBS (2011) paper
(which only focused on calling AB and LOH). I've updated the package
to clarified this in the vignette. There I also give a little bit
more explanation on what parameters are estimated in order to call
NTCN, 'kappa' and 'deltaCN', where 'kappa' is tricky to estimate
especially in tumors where nothing much goes on.

Second, likewise, Non-paired PSCBS should also be considered
experimental (although much less so than calling NTCN). It's only the
Paired PSCBS method that is well tested and published.

Third, some heads up (I know some of you are waiting for this): We're
improving the NTCN calling as part of our in-house projects, and some
potential improvements have already been discovered and may eventually
become the default. We'll make it clear in the vignette when we have
a "stable" solution. (*) Yes, it's 2013 and calling PSCNs in tumors
is still not straightforward - it's not easy to find an omnibus method
that works in all cases - many tried before us.

Fourth, you can install the latest devel version of PSCBS as (recommended):

source("http://aroma-project.org/hbLite.R");
hbInstall("PSCBS");



> For the first
> tumour sample, both callAB() and callNTCN() seem to work fine, but then on
> the second sample I get errors with callAB(), see below.

See comment after the error.

> The plotTracks()
> image looks pretty much the same between sample one and two, and I have no
> reason to believe that sample two has a major problem in the segmentation. 4
> of the 56 samples get very noisy segmentations, with lots of short regions
> scattered (has anybody seen that before?)

Yes, over-segmentation is common in CN analysis. I'd argue that you
actually want a bit of over-segmentation and post-cleanup and/or
calling method to handle that. However, I assume you mean extreme
over-segmentation. There is no simple solution for that. The ideal
solution is to do better normalization of the locus-level data
("whatever method that would be") before you pass it on to the
segmentation method.

> but that is not the case with
> sample one or two. Any suggestions to how I might assess this error?
>
> Best regards,
> Ingrid
>
> dsNList <- exportTotalAndFracB(cesN, verbose=verbose);
> dataN = extractPSCNArray(dsNList$total)
>
> yT = dataN[,'total',samp]
> CT = yT/median(yT)

Using median(yT) is not correct and will result in unnecessary large
amount of noise in the TCN signals.

Dropping index 'T'. Consider a particular sample 'i' (e.g. your tumor
of interest) and SNP 'j'. When calculating the relative TCN for this
sample and tumor, C_{i,j}, you need to rescale the observed SNP signal
y_{i,j} relative to the (unknown) mean SNP signal for this SNP,
mu_{j}, i.e. C_{i,j} = 2 * y_{i,j}/mu_{j}. (*) Multiply by "2" for
diploid genomes (GWS6 = human).

When you have matched normal ('N'), you often get very good
signal-to-noise ratios if you use mu*_{j} = y_{N,j}. Thus, for the
tumor-normal pair (T,N) you would calculate the relative TCNs for the
tumor as C_{T,j} = 2 * y_{T,j}/y_{N,j}.

When you don't have matched normal, you need to use another estimate
for mu_{j}. It's common to use the robust average of all the samples
in your data set, i.e. mu*_{j} = median_{i} (y_{i,j}).

FYI, above you are using mu*_{j} = median_{j} (y_{T,j}) - note
averaging across SNPs in one sample and not across samples in one SNP
- that is a big difference!

Thus, in your case you need to use the latter. To calculate this
robust average in the aroma framework, do:

yR <- getAverageFile(dsNList$total);

and then do:

CT = 2 * yT/yR


> betaT = dataN[,'fracB',samp]
> ugp = getAromaUgpFile(dsNList$total)
> chromosome = ugp[,1,drop = T]
> x = ugp[, 2, drop = T]
> df = data.frame(chromosome = chromosome, x=x, CT=CT, betaT = betaT)
> df = dropSegmentationOutliers(df)
> #This gives warnings:
> #24: In DNAcopy::CNA(genomdat = yKK, chrom = chromosomeKK, ... :
> #array has repeated maploc positions
> gaps = findLargeGaps(df, minLength = 1e+06)
> knownSegments = gapsToSegments(gaps)
>
> fit = segmentByNonPairedPSCBS(df, knownSegments = knownSegments,
> avgDH="median",
> tauA = .33, seed =
> 48879, verbose = -10)
> #This gives warnings that there are multipe maploc positions
> fit = callAB(fit, verbose = -10)
>
> Calling segments of allelic balance from one-sided DH bootstrap confidence
> intervals...
> delta (offset adjusting for bias in DH): 0.211106073856354
> alpha (CI quantile; significance level): 0.05
> List of 7
> $ nbrOfDHs : int 70
> $ dhNbrOfLoci : int 70
> $ mu : num 0.183
> $ dhMean : num 0.203
> $ dMu : num -0.02
> $ abs(dMu) : num 0.02
> $ range(x[units]): num [1:2] 990417 2792801
> Error in bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn,
> :
> INTERNAL ERROR: Incorrect DH mean!
> Calling segments of allelic balance from one-sided DH bootstrap
> confidence intervals...done

This is a internal sanity check throwing an error. As a start, I
recommend to retry with (i) a proper CT calculate, and (ii) after
updated PSCBS.

Hope this helps

Henrik

>
>
> >
> > head(getSegments(fit))
> chromosome tcnId dhId tcnStart tcnEnd tcnNbrOfLoci tcnMean
> tcnNbrOfSNPs tcnNbrOfHets dhNbrOfLoci dhStart dhEnd
> 1 1 1 1 61736.0 811403.5 73 1.4735545
> 15 2 2 61736.0 811403.5
> 2 1 2 1 811403.5 2821394.0 527 0.9201369
> 183 70 70 811403.5 2821394.0
> 3 1 3 1 2821394.0 2827143.5 2 5.0273015
> 2 0 0 2821394.0 2827143.5
> 4 1 4 1 2827143.5 11694138.0 5531 0.9780667
> 2940 983 983 2827143.5 11694138.0
> 5 1 5 1 11694138.0 11709225.0 5 3.3949294
> 5 3 3 11694138.0 11709225.0
> 6 1 6 1 11709225.0 16923050.0 2918 1.0072282
> 1485 550 550 11709225.0 16923050.0
> dhMean c1Mean c2Mean
> 1 0.1695309 0.6118708 0.8616838
> 2 0.2033105 0.3665317 0.5536052
> 3 NA NA NA
> 4 0.2033807 0.3895734 0.5884933
> 5 0.1465555 1.4486918 1.9462376
> 6 0.2036166 0.4010699 0.6061583
>
>
> > traceback()
> 7: stop("INTERNAL ERROR: Incorrect DH mean!")
> 6: bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn,
> ..., verbose = less(verbose, 50))
> 5: bootstrapTCNandDHByRegion(fit, statsFcn = statsFcn, ..., verbose =
> less(verbose,
> 50))
> 4: callAllelicBalanceByDH.PairedPSCBS(fit, ...)
> 3: callAllelicBalanceByDH(fit, ...)
> 2: callAB.PairedPSCBS(fit, verbose = -10)
> 1: callAB(fit, verbose = -10)
>
>
>
>
> > sessionInfo()
>
> R version 2.15.2 Patched (2013-02-09 r61879)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C
> LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] Hmisc_3.10-1 survival_2.37-4 DNAcopy_1.32.0
> PSCBS_0.30.0 aroma.affymetrix_2.8.0
> [6] affxparser_1.30.2 aroma.apd_0.2.3 R.huge_0.4.1
> aroma.light_1.28.0 aroma.core_2.8.0
> [11] matrixStats_0.6.2 R.rsp_0.8.2 R.devices_2.1.3
> R.cache_0.6.5 R.filesets_2.0.0
> [16] crlmm_1.16.9 preprocessCore_1.20.0 oligoClasses_1.20.0
> BiocGenerics_0.4.0 R.utils_1.19.5
> [21] R.oo_1.11.7 R.methodsS3_1.4.2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.26.0 annotate_1.36.0 AnnotationDbi_1.20.5
> Biobase_2.18.0 BiocInstaller_1.8.3 Biostrings_2.26.3
> [7] bit_1.1-9 cluster_1.14.3 codetools_0.2-8
> DBI_0.2-5 digest_0.6.3 ellipse_0.3-7
> [13] ff_2.2-10 foreach_1.4.0 genefilter_1.40.0
> GenomicRanges_1.10.7 grid_2.15.2 IRanges_1.16.6
> [19] iterators_1.0.6 lattice_0.20-13 Matrix_1.0-11
> mvtnorm_0.9-9994 parallel_2.15.2 RcppEigen_0.3.1.2.1
> [25] RSQLite_0.11.2 stats4_2.15.2 tools_2.15.2
> XML_3.95-0.1 xtable_1.7-1 zlibbioc_1.4.0
>
>
> >
>
>
>
> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the
> latest version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-af...@googlegroups.com
> To unsubscribe and other options, go to
> http://www.aroma-project.org/forum/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aroma-affymetr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Ingrid Lönnstedt

unread,
Mar 27, 2013, 1:03:13 AM3/27/13
to aroma-af...@googlegroups.com, h...@biostat.ucsf.edu
Thank you Henrik!
OK, I followed your instructions, and the same problem (the error in callAB()) remains, or rather, it now occurs already in sample one. Also,

1. My Nuse plot after calmate calibration look different from before (no longer symmetric around 1, see below). What difference does it make to plm that I used RmaCnPlm instead of AvgCnPlm before
qam <- QualityAssessmentModel(plm); plotNuse(qam); ?
2. The SNP specific CN estimates get a lot less variable, but the segmentation gets more variable (more and shorter segments). I suppose the latter is a consequence of the less variable CNs, so we get a higher sensitivity in some sence. However, I do not know whether it was better with longer and fewer segments for calling AB.

So, at the moment I have no idea what to come up with for tomorrows deadline, I suppose it will be some quick fix and I have to go back to this later.
All the best,
Ingrid

Henrik Bengtsson

unread,
Mar 27, 2013, 3:16:07 AM3/27/13
to aroma-af...@googlegroups.com
On Tue, Mar 26, 2013 at 10:03 PM, Ingrid Lönnstedt
<ingrid.l...@gmail.com> wrote:
>
> Thank you Henrik!
> OK, I followed your instructions, and the same problem (the error in callAB()) remains, or rather, it now occurs already in sample one. Also,
>
> 1. My Nuse plot after calmate calibration look different from before (no longer symmetric around 1, see below).

Were you meant to attach something here?

> What difference does it make to plm that I used RmaCnPlm instead of AvgCnPlm before
> qam <- QualityAssessmentModel(plm); plotNuse(qam); ?

Shouldn't make a significant difference.

> 2. The SNP specific CN estimates get a lot less variable,

Hmm... "lot less" - that sounds like your comparing toward you old CT
= yT / median(yT) - are you? The difference in total CN shouldn't big
that big. If you compare to the proper CT = yT / median(<across
samples>) I'd expect you to be less excited at this step.

> but the segmentation gets more variable (more and shorter segments). I suppose the latter is a consequence of the less variable CNs, so we get a higher sensitivity in some sence. However, I do not know whether it was better with longer and fewer segments for calling AB.

With CalMaTe, did you see similar improvement as with the online
example [http://aroma-project.org/vignettes/CalMaTe]?

>
> So, at the moment I have no idea what to come up with for tomorrows deadline, I suppose it will be some quick fix and I have to go back to this later.

There is no magic solution (except having matched normals - nag
investigators about that over and over and over... until they do it),
but CRMAv2 + CalMaTe normally does a decent job of providing good
(TCN, BAF) signals.

Also, I don't remember if you already told me, but are you working
with fresh frozen tumors or FFPE ones? The latter are often messy and
I haven't seen any methods that rescue their signals very well.

/Henrik

> All the best,
> Ingrid

Ingrid Lönnstedt

unread,
Apr 21, 2013, 11:33:19 PM4/21/13
to aroma-af...@googlegroups.com, henrik.b...@aroma-project.org
Hi Henrik,
Thank you for your answers! You do a great job, indeed! I am now back to this. See a couple of remaining issues below.
/Ingrid



On Wednesday, 27 March 2013 18:16:07 UTC+11, Henrik Bengtsson wrote:
On Tue, Mar 26, 2013 at 10:03 PM, Ingrid Lönnstedt
<ingrid.l...@gmail.com> wrote:
>
> Thank you Henrik!
> OK, I followed your instructions, and the same problem (the error in callAB()) remains, or rather, it now occurs already in sample one. Also,
>
> 1. My Nuse plot after calmate calibration look different from before (no longer symmetric around 1, see below).

Were you meant to attach something here?

Yes, can you see it now? I felt that this nuse plot is unusual. Do you think it is anything to worry about?

 

> What difference does it make to plm that I used RmaCnPlm instead of AvgCnPlm before
> qam <- QualityAssessmentModel(plm); plotNuse(qam); ?

Shouldn't make a significant difference.

> 2. The SNP specific CN estimates get a lot less variable,

Hmm... "lot less" - that sounds like your comparing toward you old CT
= yT / median(yT) - are you?  The difference in total CN shouldn't big
that big.  If you compare to the proper CT = yT / median(<across
samples>) I'd expect you to be less excited at this step.

You are right, I compare to the old incorrect CT =  yT / median(yT). So, I am not worried about this anymore.

> but the segmentation gets more variable (more and shorter segments).  I suppose the latter is a consequence of the less variable CNs, so we get a higher sensitivity in some sence. However, I do not know whether it was better with longer and fewer segments for calling AB.

With CalMaTe, did you see similar improvement as with the online
example [http://aroma-project.org/vignettes/CalMaTe]?

Yes, there is a large improvement. Again, no remaining worries here.

>
> So, at the moment I have no idea what to come up with for tomorrows deadline, I suppose it will be some quick fix and I have to go back to this later.

There is no magic solution (except having matched normals - nag
investigators about that over and over and over... until they do it),
but CRMAv2 + CalMaTe normally does a decent job of providing good
(TCN, BAF) signals.

Also, I don't remember if you already told me, but are you working
with fresh frozen tumors or FFPE ones?  The latter are often messy and
I haven't seen any methods that rescue their signals very well.

Fresh, frozen tumours.

/Henrik

So, my main problem that remains is insanity check failures in callAB(). I have followed my sample 1 into bootstrapTCNandDHByRegion.PairedPSCBS with debug() and concluded that I have 592 segments, and the first failure occurs with the 233rd segment. I have printed the CBS info for segments 220 to 233, in case that would help, as well as the current error message and session Info. If there is a failure with segment 233, is there a reason not to report the callAB  outcomes of the other segments? What does the sanity check failure tell me, and could I do any fix to get round it or at least get something for the other segments?

Thanks,
Ingrid


Browse[3]> segs[220:233,]

    chromosome tcnId dhId  tcnStart    tcnEnd tcnNbrOfLoci   tcnMean
220          9    11    1  73187382 125406672        38303 2.1977136
221          9    12    1 125406672 141091395         8913 2.1506696
222         NA    NA   NA        NA        NA            0        NA
223         10     1    1     72760  17932882        15166 2.0003566
224         10     2    1  17932882  17956182            2 4.5480683
225         10     3    1  17956182  18179544           18 1.9775509
226         10     4    1  18179544  18202780            3 0.6127961
227         10     5    1  18202780  24376470         4403 2.0286301
228         10     6    1  24376470  24378482           11 0.6265435
229         10     7    1  24378482  39076222        10124 2.0268394
230         10     8    1  39076223  42433539            0       NaN
231         10     9    1  42433540  86411230        28982 2.0375534
232         10    10    1  86411230  89029244         1697 2.0558821
233         10    11    1  89029244  89115851           21 2.5505440
    tcnNbrOfSNPs tcnNbrOfHets dhNbrOfLoci   dhStart     dhEnd     dhMean
220        18928         5290        5290  73187382 125406672 0.07756713
221         4002         1101        1101 125406672 141091395 0.07943612
222           NA           NA           0        NA        NA         NA
223         8456         2087        2087     72760  17932882 0.07487577
224            0            0           0  17932882  17956182         NA
225            4            1           1  17956182  18179544 0.04524338
226            0            0           0  18179544  18202780         NA
227         2212          543         543  18202780  24376470 0.07859021
228            0            0           0  24376470  24378482         NA
229         4971         1086        1086  24378482  39076222 0.07712847
230            0            0           0  39076223  42433539         NA
231        13983         3576        3576  42433540  86411230 0.06990635
232          858          192         192  86411230  89029244 0.06439847
233            4            1           1  89029244  89115851 0.36474615
       c1Mean   c2Mean
220 1.0136216 1.184092
221 0.9899144 1.160755
222        NA       NA
223 0.9252892 1.075067
224        NA       NA
225 0.9440399 1.033511
226        NA       NA
227 0.9345998 1.094030
228        NA       NA
229 0.9352562 1.091583
230        NA      NaN
231 0.9475577 1.089996
232 0.9617432 1.094139
233 0.8101214 1.740423

deltaCN <- estimateDeltaCN(fit, scale=1) #0.2601609 for my sample
fit = callAB(fit, delta = deltaCN, verbose = -10)
List of 7
 $ nbrOfTCNs      : int 21
 $ tcnNbrOfLoci   : int 21
 $ mu             : num 2.58
 $ tcnMean        : num 2.55
 $ dMu            : num 0.0248
 $ abs(dMu)       : num 0.0248
 $ range(x[units]): num [1:2] 89029244 89114401

Error in bootstrapTCNandDHByRegion.PairedPSCBS(fit, statsFcn = statsFcn,  :
  INTERNAL ERROR: Incorrect TCN mean!


> sessionInfo()
R version 2.15.2 Patched (2013-02-09 r61879)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=C                 LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
 [1] DNAcopy_1.32.0         PSCBS_0.34.1           aroma.affymetrix_2.8.0
 [4] affxparser_1.30.2      aroma.apd_0.2.3        R.huge_0.4.1         
 [7] aroma.light_1.28.0     aroma.core_2.8.0       matrixStats_0.7.0    
[10] R.rsp_0.8.2            R.devices_2.1.3        R.cache_0.6.5        
[13] R.filesets_2.0.0       R.utils_1.23.2         R.oo_1.13.0          
[16] R.methodsS3_1.4.2    

loaded via a namespace (and not attached):
[1] calmate_0.10.0 digest_0.6.3   MASS_7.3-23  


Ingrid Lönnstedt

unread,
Apr 21, 2013, 11:37:19 PM4/21/13
to aroma-af...@googlegroups.com, henrik.b...@aroma-project.org
Yet one more try to attach figure./Ingrid


On Wednesday, 27 March 2013 18:16:07 UTC+11, Henrik Bengtsson wrote:
Nuse.png

Henrik Bengtsson

unread,
Apr 22, 2013, 9:00:39 PM4/22/13
to Ingrid Lönnstedt, aroma-af...@googlegroups.com
Hi Ingrid.

[cleaning up by dropping everything but the
bootstrapTCNandDHByRegion() error; will reply to the other stuff
separately.]

On Sun, Apr 21, 2013 at 8:33 PM, Ingrid Lönnstedt
<ingrid.l...@gmail.com> wrote:
>
> Hi Henrik,
> Thank you for your answers! You do a great job, indeed! I am now back to
> this. See a couple of remaining issues below.
> /Ingrid
>
>
>
> On Wednesday, 27 March 2013 18:16:07 UTC+11, Henrik Bengtsson wrote:
>>
>> On Tue, Mar 26, 2013 at 10:03 PM, Ingrid Lönnstedt
>> <ingrid.l...@gmail.com> wrote:
>> >
>> > Thank you Henrik!
>> > OK, I followed your instructions, and the same problem (the error in
>> > callAB()) remains, or rather, it now occurs already in sample one. Also,

[snip]
This is an internal error that the *implementation* of the PSCBS
algorithm is correct and handles all "edge cases"; apparently there is
still another case that was not considered. These "edge cases" are
for instance when two SNPs share the exact same location but the
internal CBS segmentation method wants to put a change point in
between them because that's what the data happens to say. In other
words, it's nothing you're doing wrong and probably nothing you should
try to work around either. (I'll think about this a bit more since it
happens once in a while and it would be nice to be able to keep going.
On the other hand, if it would be just a warning you would have never
told me and the error would never be fixed.)

First, update to the latest PSCBS v0.34.8 (on CRAN since yday) and
verify that the error remains. If it does, could you save the
segmentation result;

saveObject(fit, "problem.Rdata")

and make 'problem.Rdata' available to me for download somewhere? Then
I can troubleshoot.

/Henrik
[snip]

>> >>
>> >> source("http://aroma-project.org/hbLite.R");
>> >> hbInstall("PSCBS");
>> >>
>> >>
>> >>
>> >> > For the first
>> >> > tumour sample, both callAB() and callNTCN() seem to work fine, but
>> >> > then on
>> >> > the second sample I get errors with callAB(), see below.
>> >>
>> >> See comment after the error.
>> >>
>> >> > The plotTracks()
>> >> > image looks pretty much the same between sample one and two, and I
>> >> > have no
>> >> > reason to believe that sample two has a major problem in the
>> >> > segmentation. 4
>> >> > of the 56 samples get very noisy segmentations, with lots of short
>> >> > regions
>> >> > scattered (has anybody seen that before?)

[snip]

Henrik Bengtsson

unread,
Apr 23, 2013, 3:57:53 PM4/23/13
to Ingrid Lönnstedt, aroma-af...@googlegroups.com
Hi,

I could reproduce the error with the data you've sent me offline.
I've located the bug, which indeed was an edge-case bug, and I've
fixed it. Update to PSCBS v0.35.0:

source("http://aroma-project.org/hbLite.R");
hbInstall("PSCBS");

I've verified that it works with your data. Below is the script I used.

stopifnot(packageVersion("PSCBS") >= "0.35.0");

library("PSCBS");

library("R.devices");
devOptions("png", width=1024);
setOption("devEval/args/force", FALSE);

sampleName <- "ingrid";

## Load the data from a previously saved segmentation object.
pathname <- sprintf("%s.Rbin", sampleName);
fit <- loadObject(pathname);
data <- getLocusData(fit);
rm(fit);


## Locate centromeres and other large gaps in data
gaps <- findLargeGaps(data, minLength=1e6);
knownSegments <- gapsToSegments(gaps);
## => dim(knownSegments) == c(64, 4)

## 'betaT' is already normalized using TumorBoost => tbn=FALSE.
fit <- segmentByNonPairedPSCBS(data, knownSegments=knownSegments,
avgDH="median", seed=0xBEEF, verbose=-10);
printf("Number of segments: %d\n", nbrOfSegments(fit));
## Number of segments: 568

toPNG(sampleName, tags="tracks,avgDH=median", aspectRatio=0.6, {
plotTracks(fit);
});


## Estimate global background level in [0,1] (due to normal
contamination and more)
kappa <- estimateKappa(fit, verbose=-10);
printf("Kappa: %g\n", kappa);
## Kappa: 0.374814


## Call allelic balance
## (a) Estimate DH threshold for calling AB
deltaAB <- estimateDeltaAB(fit, kappa=kappa); # If skipped, will be
done internally
printf("Delta_AB: %g\n", deltaAB);
## Delta_AB: 0.146787

## (b) Call AB based on bootstrapped segment DH levels
fit <- callAB(fit, delta=deltaAB, verbose=-100);
printf("Number of AB segments: %d\n", sum(getSegments(fit)$abCall, na.rm=TRUE));
## Number of AB segments: 193

toPNG(sampleName, tags="tracks,avgDH=median,AB", aspectRatio=0.6, {
plotTracks(fit);
});


## Call loss of heterozygosity (LOH)
## (a) Estimate C_1 threshold for calling LOH
deltaLOH <- estimateDeltaLOH(fit); # If skipped, will be done internally
printf("Delta_LOH: %g\n", deltaLOH);
## Delta_LOH: 0.566545

## (b) Call LOH based on bootstrapped segment C_1 levels
fit <- callLOH(fit, delta=deltaLOH, verbose=-10);
printf("Number of LOH segments: %d\n", sum(getSegments(fit)$lohCall,
na.rm=TRUE));
## Number of LOH segments: 149

toPNG(sampleName, tags="tracks,avgDH=median,AB+LOH", aspectRatio=0.6, {
plotTracks(fit);
});


## Call NTCN
## (a) Estimate the threshold for calling neutral TCN segments
## By shrinking 'scale', more segments will be non-NTCN.
deltaCN <- estimateDeltaCN(fit, scale=1.0, kappa=kappa);
printf("Delta_CN: %g\n", deltaCN);
## Delta_CN: 0.312593

## (b) Call NTCN based on bootstrapped segment TCN levels
fit <- callNTCN(fit, delta=deltaCN, verbose=-20);
printf("Number of NTCN segments: %d\n", sum(getSegments(fit)$ntcnCall,
na.rm=TRUE));
## Number of NTCN segments: 303

toPNG(sampleName, tags="tracks,avgDH=median,AB+LOH+NTCN", aspectRatio=0.6, {
plotTracks(fit);
});

/Henrik
Reply all
Reply to author
Forward
0 new messages