non-matched normals

igor

unread,

Jul 16, 2020, 11:18:00 PM7/16/20

to Sequenza User Group

The sequenza-utils guide covers the scenario with a non-matched normal. There is even a specific parameter to deal with those cases (--normal2). However, the Sequenza R guide does not discuss that case. I also cannot find anything relevant in the documentation. Processing those sample pairs leads to very strange results. Much of the output is related to point mutations, so it makes sense those results would be off, but copy number profiles look very strange as well. Do you have any guidelines on how to process samples without a matching normal? What is the appropriate way to deal with those seqz files in R?

Thank you.

Francesco Favero

unread,

Jul 17, 2020, 4:32:27 AM7/17/20

to igor, Sequenza User Group

Hi Igor,

In theory, if you used the —normal2 parameter to create the seqz file, you should be able to go with the sequenza R package without major notes.

You should have used the tumor sample twice in the parameters:

... —tumor <your_tumor_sample>.bam and —normal <your_tumor_sample>.bam —normal2 <your_unrelated_normal_sample>.bam (or pileup or anything relevant).

I know this is a bit confusing and I’m not sure I’ve cover it properly in the docs. Using the tumor itself as normal will enable to still get the SNPs “right” in order to capture the changing BAF.

The caveat is that you need to pick a threshold on the allele frequency (leaving the default should be fine);

if you are using a real-normal sample this wouldn’t be much of a problem, but using a tumor for this can lead to missing regions if the tumor is very pure (eg if you are using cell-line).

I think the weirdness you are talking about could be missing or unmatched BAF?

If the problem arise in the fact that the sample used in --normal2 does have some copy changes as well, you could try the option, ignore.normal the in R pacakge.

This should use the normalized tumor depth alone instead of the ratio vs the normal.

In any case, with non-matchin normal mutations results are to be discarded. But you shouldn’t have any results at all for mutations: your normal and tumor input are the same file.

I hope you manage to clean up the data a bit with this suggestions :)

Cheers

Francesco

On 17 Jul 2020, at 05.17, igor <dolg...@gmail.com> wrote:

The sequenza-utils guide covers the scenario with a non-matched normal. There is even a specific parameter to deal with those cases (--normal2). However, the Sequenza R guide does not discuss that case. I also cannot find anything relevant in the documentation. Processing those sample pairs leads to very strange results. Much of the output is related to point mutations, so it makes sense those results would be off, but copy number profiles look very strange as well. Do you have any guidelines on how to process samples without a matching normal? What is the appropriate way to deal with those seqz files in R?

Thank you.

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/70796247-4b56-4145-9663-0056fda67c12n%40googlegroups.com.

igor

unread,

Jul 17, 2020, 2:44:44 PM7/17/20

to Sequenza User Group

Hi Francesco.

Thank you for the quick reply. I actually have been doing what you describe. As you mentioned, it's described in the docs. I am attaching a result for a normal sample compared to a different normal sample in non-matched mode with "ignore.normal" setting. The copy numbers range from 5 to 20, which does not make sense. Maybe the visual representation will help.

Based on an independent matched and non-matched analysis with a different tool, I get a flat profile as would be expected. Thus, I don't expect to find any copy number alterations.

nonmatched_genome_view.pdf

Francesco Favero

unread,

Jul 17, 2020, 5:25:29 PM7/17/20

to igor, Sequenza User Group

Can you try to run it with the latest version of the package on bitbucket?

You should be able to install it with devtools

install.bitbucket("sequenzatools/sequenza@master")

or something similar.

I have pushed a fix on the BAF calculation.

However, The data you sent me, it doesn't look like an ideal situation.

You might need to manually tweak the results to your discretion.

(don't expect the algorithm to give you the right results with such noisy data; use the solutions more like suggestion)

Sent from my iPad

On 17 Jul 2020, at 20.44, igor <dolg...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/d2b4b97f-c74a-4215-8724-d4ab14f7e3d8n%40googlegroups.com.

<nonmatched_genome_view.pdf>

igor

unread,

Sep 4, 2020, 1:04:40 AM9/4/20

to Sequenza User Group

Sorry about the slow response. I installed the latest Bitbucket version of Sequenza. The output is essentially the same.

You also mentioned I may need to manually tweak the results. What kind of tweaks would you suggest?