Dear sequenza users,
I posted these questions on seqanwers (http://seqanswers.com/forums/showthread.php?p=188559#post188559) last week, with no answer at the moment. I can be luckier here. I am using Sequenza only recently. After the python script, I ran the 3 functions sequenza.extract(), sequenza.fit() and sequenza.results() with default parameters. My data is a set of 5 tumor-normal WES pairs. Now, I would like to adjust the parameters more precisely. The problem is that I do not clearly understand part of them, even after reading the vignette, the discussions from this google group and the description of the R functions. Here are the parameters: 1) sequenza.extract
2) sequenza.fit
My last concern is the gamma parameter, which is crucial for segmentation. Since I have WES data, I chose breaks.method="full" and now I want to determine gamma.pcf and kmin.pcf. From what I understood, in cancer data, the range varies often from 15 to 40. I started with 40. From 140 to 40, copy number estimations did not change but 2 out 5 cellularity estimations changed: from 0.46 to 0.97 and from 0.35 to 0.26. Do you have a way to determine gamma? Is it possible to use the gamma.plot function of copynumber package? I am surprised to get very distant cellularities from what we believe. We are pretty sure to have high purity, most likely >90%, but sequenza returned range between 0.35 to 0.5 with default settings. Sorry for the high number of questions and the long post! I do not know a lot sequenza and I am not an expert in segmentation. Thank you in advance for your help. Jane |
|
|
![]() |
I am using Sequenza only recently. After the python script, I ran the 3 functions sequenza.extract(), sequenza.fit() and sequenza.results() with default parameters. My data is a set of 5 tumor-normal WES pairs.
Now, I would like to adjust the parameters more precisely. The problem is that I do not clearly understand part of them, even after reading the vignette, the discussions from this google group and the description of the R functions.
Here are the parameters:
1) sequenza.extract
- window=10^6
"size of windows used when plotting mean and quartile ranges of depth ratios and B-allele frequencies. Smaller windows will take more time to compute"
Is this parameter useful only for plotting?
- I changed it to 500 and I did not see changes when looking at the genome_view.pdf
- overlap=1
"integer specifying the number of overlapping windows"
If we consider a specific window, it can overlap only 0, 1 or 2 window(s), right?
- min.type.freq=0.9
"minimum frequency of aberrant types"
What does it mean?
- weighted.mean=TRUE
"boolean to select if the segments should be calculated using the read depth as weights to calculate depth ratio and B-allele frequency means"
What does this mean?
2) sequenza.fit
- N.ratio.filter=10
"Threshold of minimum number of observation of depth ratio in a segment"
Minimum number of variants in a segment?
- N.BAF.filter=1
"threshold of minimum number of observation of B-allele frequency in a segment"
Minimum number of variants in a segment?
Why is the default value not the same as for N.ratio.filter?
- segment.filter=3 10^6
"threshold segment length (in base pairs) to filter out short segments, that can cause noise when fitting the cellularity and ploidy parameters. The threshold will not affect the allele-specific segmentation"
Is it the minimum length of a segment?
What is the usual range? 3 10^6 seems big.
- ratio.priority=FALSE
"logical, if TRUE only the depth ratio will be used to determine the copy number state, while the Bf value will be used to determine the number of B-alleles"
Does this mean that with FALSE, both depth ratio and BAF will be use to determine copy number and only BAF to determine the number of B-alleles?
My last concern is the gamma parameter, which is crucial for segmentation.
Since I have WES data, I chose breaks.method="full" and now I want to determine gamma.pcf and kmin.pcf.
From what I understood, in cancer data, the range varies often from 15 to 40.
I started with 40. From 140 to 40, copy number estimations did not change but 2 out 5 cellularity estimations changed: from 0.46 to 0.97 and from 0.35 to 0.26.
Do you have a way to determine gamma? Is it possible to use the gamma.plot function of copynumber package?
I am surprised to get very distant cellularities from what we believe. We are pretty sure to have high purity, most likely >90%, but sequenza returned range between 0.35 to 0.5 with default settings.
Sorry for the high number of questions and the long post! I do not know a lot sequenza and I am not an expert in segmentation.
Thank you in advance for your help.
Jane