Questions on BAF and LogR

2,299 views
Skip to first unread message

Emily

unread,
Jan 27, 2016, 1:06:40 PM1/27/16
to Sequenza User Group
Dear users,

I am using Sequenza generated profiles for tumor and nomal samples on WES data. When I reviewed the outputs, I realized that BAF in sequenza is between 0-0.5, but normally we define the BAF is from 0 to 1, when BAF is 0.5, it is heterozygous. If we have BAF is 0-0.5, how do we define the het?

Another question is on calculation on logR ratio based on the copy number. I saw both depth ratio and CNt, depth ratio is not CNt/2, that is, depth ratio is not copy ratio (we usually used). I need to get logR for other calculation, please let me know how to do it?

I may ask those naive questions, but I hope someone could help me clarified.

Thanks a lot for the help, it's appreciated.

Emily

Francesco Favero

unread,
Jan 29, 2016, 8:55:27 AM1/29/16
to Emily, Sequenza User Group
Dear Emyl,
No question is naive here, to many formats and conventions to keep up nowadays in bioinformatics.

To simplify the model fitting, we identify the B-allele frequency, defining the B-allele as the minor allele.
So the minor allele can only have a frequency between 0 and 0.5, while the major allele would have a range between 0.5 and 1.
The seqz file is produced by comparing position vs position the normal and tumor mpileups. We define het positions based on the normal sample genotype, and annotate the A and B allele frequencies of such position only of the tumor sample in the seqz file.

So the A and B frequencies in the seqz file are not to be considered to define zygosity (if the sample is high cellularity, eg cell lines, in LOH regions the A/B frequencies would be 1/0 respectively).

Regarding the depth.ratio, the number in the seqz file are merely the ratio between the depth of the two pileup. The GC normalization is applied during the processing in R, otherwise the average ratio depends on the two library size.

The normalization step in R is very simple,: in practice we calculate the mean ratio value for each GC "window", and then use the results to normalize the depth-ratio based on the respective GC content. This way the average mean is 1, solving library normalization and GC normalization.

You can have the logR by transforming in log2 of the normalized depth ratio.

If you are interested in having row data you can also run VarScan2, it will provide you with snp data, with more information then the seqz file and a copynumber file with already log2/normalized data.

It is possible to import the varscan calls with sequenza in R.

I hope this was somehow useful, please keep up with the question if you have any problem/doubt.

Best

Francesco

Ryan Morin

unread,
Mar 22, 2016, 2:08:14 PM3/22/16
to Sequenza User Group
Was this the answer you were looking for regarding accessing/computing logR? I'm sorry but the response didn't make sense to me. Where can we get the normalized depth ratio such that we can derive the logR? The output files appear to only contain depth.ratio, N.ratio and sd.ratio.

Francesco Favero

unread,
Apr 7, 2016, 6:24:22 PM4/7/16
to Sequenza User Group
Hi Ryan,

I'm sorry I've missed your post for so long.

In the segments results, the depth.ratio refears to the adjusted deoth ratio.

Sonif you need to use the "logR" for segmented data you can just log transform thst column.

Best

Francesco

Tommy Tang

unread,
Feb 1, 2017, 6:16:27 PM2/1/17
to Sequenza User Group
Hi Francesco,

in the segment file, I have depth.ratio of 0.9, but CNt is 4. how is that possible?
I think depth ration of 1 means the CNt is 2.

Thanks,
Tommy

Francesco Favero

unread,
Feb 1, 2017, 7:52:04 PM2/1/17
to Tommy Tang, Sequenza User Group
Hi Tommy,

Generally you are right, but it depends on many things: for instance if your ploidy estimate is more then 2 (more then diploid), then your assumptions is wrong.
For instance if you have a ploidy estimate of 5 or more, then your CNt for a segment with ~0.9 depth.ratio would be around 4.
Everything depends also on the purity estimate, a step of 0.1 (1-0.9) can be a lot or not too much… there are not fix ratio values that are convertible to specific copy number values, everything is inferred by a model that depend on few parameters. Depending on the parameters the results can change quite drastically. 

If you are able to provide more information about your results (eg, purity, ploidy estimates, and some more information about the segments) I’ll be able to help you more.


Best

Francesco

-- 
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tommy Tang

unread,
Feb 2, 2017, 10:09:14 AM2/2/17
to Sequenza User Group, tangmi...@gmail.com
Hi Francesco,

Thanks very much for your reply. I am new to sequenza, and a bit confused.

The tumor that we sequenced has cellularity of 0.69 and ploidy of 4.6 predicted by sequenza. (I saw several threads discussing manual choose the solutions, which I do not know how).

I want to understand better of the sequenza output, specifically the meaning of each column
------------+-----------+-----------+-------------------+-------+-------------------+-------------------+---------+-------------------+-----+---+---+---
|  chromosome | start.pos | end.pos   | Bf                | N.BAF | sd.BAF            | depth.ratio       | N.ratio | sd.ratio          | CNt | A | B | 
|-------------+-----------+-----------+-------------------+-------+-------------------+-------------------+---------+-------------------+-----+---+---+---
|  1          | 10013     | 55564357  | 0.27783663159116  | 2122  | 0.142598436464601 | 0.9145693510693   | 547001  | 0.29785879987071  | 4   | 3 | 1 


Bf is the B-allelle frequency, what is N.BAF and N.ratio?

depth.ratio is 0.91 here (I saw in another thread this is the GC and library size normalized depth ratio), the CNt is 4.

How to take the ploidy into account here to calculate the CNt to 4?

If I want to check which gene has copy gain or copy loss, should I just use the CNt values?

what does cellularity of 0.69 mean? does that mean there are 69% of the sequenced cells are tumor cells?

" We estimated a cellularity of ~ 50% which mean that half reads observed in tumor come from the normal cells"

I am confused. 

in my understanding. suppose, we sequenced total 400 cells in tumor sample. 50% of cellularity means 200 cells are tumors cells, and 200 cells are normal cells. if the ploidy of the tumor is 4. 1/3 the total reads will come from normal, and 2/3 of the total will come from tumor.

sorry for these many questions.

Thanks,
Tommy






To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Francesco Favero

unread,
Feb 2, 2017, 7:35:54 PM2/2/17
to Tommy Tang, Sequenza User Group
Hi Tommy,

Your consideration about the ploidy is correct, the ploidy means more DNA, so if you have an admixture of 1 tetraploid tumor cell and 1 normal cell, you would have 6 alleles in total, 4 from the tumor and 2 from the normal (this will correspond approximately to the proportion of the number of reads),  however the cellularity is still 50%.

N.BAF and N.ratio are the number of observations we used to calculate the Bf and the dept.ratio, respectively.

In your case, you have a ploidy estimate of 4.6, that means that and dept.ratio = 1 a segment should have copy number 4.6, however the model consider only integer copy numbers, and a floating copy number would imply a subclonal fraction (and we would treat those scenario as a separate workflow).
So your segment with ratio 0.9 is rightfully correlated with a CNt of 4. The Bf value, quite low, means that the A and B allele are not in even proportion, the model estimates 1 copy of B and 3 copies of A.

To detect copy gain/los, everything that is estimated bigger then 5 is copy gain, less the 5 is a loss.

In order to visually inspect the results, you should use the model fit plot, the CP contour plot (cellularity vs ploidy estimates), and visually inspect the chromosome view, to see if the profile looks regular, or there are clear bias in the data.

The most relevant, in my opinion, is the mode fit with the alternative solution (the pdf with yellow/red halo, and blacks dots all). The black dots are your segments, the colored spots are the position predicted by the model. the importance of the visual inspection is because the human brain can pick up in an instant (with a bit of experience) odds situations, for instance, if aneuploidy is caused by overfitting noise, or if one of the alternative solution make more sense than the top ranked.

I hope this will help you going on with your results

Best

Francesco

On 2 Feb 2017, at 16.09, Tommy Tang <tangmi...@gmail.com> wrote:

0.9145693510693

Tommy Tang

unread,
Feb 3, 2017, 12:25:53 PM2/3/17
to Sequenza User Group, tangmi...@gmail.com

Thanks.
Just to make sure, the depth.ratio is the reading depth from the tumors cells of the tumor sample divided by the reading depth of the normal control?
Does this depth.ratio takes into account of the tumor purity?

simply case:

suppose the tumor purity is 0.5, we have 1 normal cell (diploid) and 1 tumor cell (ploidy of 4) sequenced in the tumor sample.
we also have 1 normal cell sequenced in the normal control sample.

The depth.raio = (2 allele + 4 allele )/2 allele = 3?

The genome view of my real sample that has a predicted ploid of 4.6:


How to read the figures and figure out if the ploidy estimation is correct?

The alternative solutions are:


"cellularity" "ploidy" "SLPP"

0.69 4.6 0.127104505129711

0.65 5.7 0.116550925080056

0.78 3.1 3.16170360778432e-47


I appreciate your help every much!

Tommy

emreko...@googlemail.com

unread,
Mar 8, 2019, 2:56:32 PM3/8/19
to Sequenza User Group
Hi Tommy and Francesco, 

I am currently struggling with the same question. How to pick the best result for ploidy (and therefore cellularity) when there are several options. It seems that SLPP is not helpful for every case and manual curation of each model fit cannot be the solution (working with n>600 samples).
@Tommy Did you find a convenient solution in the meantime?

Best
Emre

Bruno Batista de Souza

unread,
May 27, 2019, 10:00:04 AM5/27/19
to Sequenza User Group
Hi Emre, Tommy and Francesco,

I have the same question as Emre and Tommy. Did you guys found a solution?

Thanks
Bruno
Reply all
Reply to author
Forward
0 new messages