Different result between sequenza2 and sequenza3

120 views
Skip to first unread message

rkendar

unread,
Nov 28, 2019, 2:57:15 AM11/28/19
to Sequenza User Group
Hi Francesco,

Now I am evaluating the output of previous sequenza 2.1.2 and 3.0.0 version. I use the same version of supporting tools in the pipeline, except for samtools and R.
- Sequenza 2.1.2 = samtools 1.09 and R 3.4.1
- Seqeunza 3.0.0 = samtools 1.8 and R 3.6.0

The reason I use different samtools is because I get block counting error when I combine between samtools 1.09 + Seqeunza 3.0.0. But the error disappeared when I change to samtools 1.8. Is it expected?

As you can see on the attachment, the results show more noise (genome_view.pdf) and less mutation was detected (mutation.txt) in sequenza3.
I did notice difference in the base quality mapping in the pileup file that produced by different samtools version. Thus I also get different seqz file. 
So, my questions is:

1. Is different samtools version really give a huge impact to the end result of sequenza?
2. Is there any other suspected factors that may contribute to the different result?


Best regards,
R
sequenza2.zip
sequenza3.zip

Francesco Favero

unread,
Nov 28, 2019, 5:29:08 AM11/28/19
to rkendar, Sequenza User Group
Hi,

Thanks for the exciting comparison between the two version.
The mutation part between sequenza2 and seqenza3 is practically identical. it may had some code refactoring ion the new version but the filtering and concepts are the same.

The on the other hand. the python program to generate the seqz file, is quite changed. I can’t really track down a version history with specific samtools changes, but indeed there is a dependencies on newer samtools version, and the sequenza-utils commands relay on samtools or some other program (there is a snp2seqz file that takes VCFs) to read the BAM files.

Are you using the mutations.txt file from sequenza in downstream analyses?
I’m really interested in the use case of mutation calls from sequenza.
Although sequenza ultimately annotates the mutations with the number of variant alleles (Mt) vs the total number of alleles (CNt) in the tumor, the actual mutation call is performed in a very naive way, compared to highly advances and specialised methods available now-days (such as, eg mutect2, strelka, caveman…).

Thanks again

Best regards

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/f386ea5f-96ef-473f-bf2c-751c0eb43547%40googlegroups.com.
<sequenza2.zip><sequenza3.zip>

Message has been deleted

rkendar

unread,
Dec 2, 2019, 1:35:08 AM12/2/19
to Sequenza User Group
Hi Francesco,

Thank you for your prompt reply.
And well noted that the mutation step is the same but there's some changes in .seqz step. 

Since I have different mpilepup result because of different samtools version, I plan to use the bam directly instedad of pileup file. However, I didn't see any option in bam2seqz to exclude any reads with low mapping quality (MAPQ) (exp. MAPQ = 0).
I use -q option in samtools mpileup to exlcude low MAPQ. I do noitice bam2seqz have -q option, but it use to set the base quality which is equals to -Q (minimum base quality for base to be considered) in samtools mpileup options. Please correct me if I am wrong.
Do you have option like -q in samtools mpileup to set the min. MAPQ in bam2seqz or it will automatically remove low MAPQ?

For your question about the mutation.txt, unfortunately we didn't use that. We use mutect2 and strelka as you pointed out for the variant calling.

Regards,
R

rkendar

unread,
Dec 6, 2019, 2:51:33 AM12/6/19
to Sequenza User Group
Hi Francesco,

I found the answer from here https://sequenza-utils.readthedocs.io/en/latest/api.html mpileup in bam2seqz will remove -q 20 and -Q 20.

Thank you.

Reply all
Reply to author
Forward
0 new messages