Basic questions about snp2seqz

213 views
Skip to first unread message

rkendar

unread,
Feb 1, 2020, 7:15:04 AM2/1/20
to Sequenza User Group
Hi Francesco,

Currently I am using snp2seqz to generate seqz file on my WGS samples (paired tumor-normal). 
I did bam2seqz before using --paralell and it did save a lot of time (from 2 days to 12 hours), but I want to go further to try to shorten the pipeline by using snp2seqz.
I have some basic questions regarding snp2seqz:

1. In general, is the result will be the same compare to bam2seqz? Which one is more accurate?

2. Do I need to have both germline and somatic VCF? Is there any drawbacks if I only use the somatic VCF?

3. Now I have 2 VCF, i.e. somatic VCF from mutect2 and germline VCF from haplotypecaller. 
I used --preset mutect on mutect2 VCF but it didn't work, it did work without any preset as below. Is it correct thing to do?
sequenza-utils snp2seqz --vcf ${somvcf} -gc ${params.gc_wig} -o ${patient}/${sample}_somatic_seqz.gz

Meanwhile, the  --preset mutect and without preset didn't work for the haplotypecaller VCF. Do you have any suggestion why and what preset that I should use?
sequenza-utils snp2seqz --vcf ${germvcf} -gc ${params.gc_wig} --preset mutect -o ${patient}/${sample}_germline_seqz.gz

Below is the head of both VCF:
> mutect2 VCF
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  WHT724  WHT725
chr1    
14522   rs1441808061    G       A       .       map_qual;normal_artifact        CONTQ=93;ClippingRankSum=-2.03;DP=112;ECNT=1;FS=8.031;GERMQ=93;MBQ=41,41;MFRL=345,328;MMQ=35,22;MPOS=49;MQ=33.04;MQ0=0;MQRankSum=-3.438;NALOD=-1.923;NLOD=3.32;POPAF=6;ReadPosRankSum=0.834;SEQQ=50;STRANDQ=36;TLOD=10.05;ANN=A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*113G>A|||||113|,A|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000278267|transcript|ENST00000619216.1|miRNA||n.*2847C>T|||||2847|,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000450305.2|transcribed_unprocessed_pseudogene||n.*852G>A|||||852|,A|intron_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000488147.1|unprocessed_pseudogene|10/10|n.1254-21C>T||||||     GT:AD:AF:DP:F1R2:F2R1:SB        0/0:25,1:0.071:26:11,1:13,0:17,8,1,0    0/1:81,4:0.058:85:39,2:39,2:49,32,4,0
chr1    
16534   rs15642 C       T       .       map_qual;normal_artifact;weak_evidence  CONTQ=93;ClippingRankSum=1.715;DP=147;ECNT=1;FS=11.521;GERMQ=93;MBQ=41,41;MFRL=307,391;MMQ=23,22;MPOS=12;MQ=28.01;MQ0=0;MQRankSum=-0.683;NALOD=-1.023;NLOD=11.14;POPAF=6;ReadPosRankSum=-1.566;SEQQ=1;STRANDQ=31;TLOD=3.28;ANN=T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000456328.2|processed_transcript||n.*2125C>T|||||2125|,T|downstream_gene_variant|MODIFIER|MIR6859-1|ENSG00000278267|transcript|ENST00000619216.1|miRNA||n.*835G>A|||||835|,T|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|transcript|ENST00000450305.2|transcribed_unprocessed_pseudogene||n.*2864C>T|||||2864|,T|intron_variant|MODIFIER|WASH7P|ENSG00000227232|transcript|ENST00000488147.1|unprocessed_pseudogene|8/10|n.1067+73G>A||||||   GT:AD:AF:DP:F1R2:F2R1:SB        0/0:51,1:0.036:52:25,1:26,0:21,30,0,1   0/1:86,4:0.053:90:39,3:47,1:40,46,0,4

> haplotypecaller VCF from normal sample
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  WHT724
chr1    
15274   .       A       G       340.09  .       AC=2;AF=1.00;AN=2;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MMQ=60,22;MQ=24.09;MQ0=0;QD=28.34;SOR=0.693       GT:AD:DP:GQ:PL  1/1:0,12:12:36:342,36,0
chr1    
15903   .       G       GC      519.06  .       AC=2;AF=1.00;AN=2;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MMQ=60,36;MQ=34.08;MQ0=0;QD=25.36;SOR=3.912        GT:AD:DP:GQ:PL  1/1:0,14:14:45:533,45,0
chr1    
16378   .       T       C       63.60   .       AC=1;AF=0.500;AN=2;BaseQRankSum=2.160;ClippingRankSum=-0.631;DP=63;ExcessHet=3.0103;FS=8.184;MLEAC=1;MLEAF=0.500;MMQ=43,23;MQ=35.93;MQ0=0;MQRankSum=-3.113;QD=1.01;ReadPosRankSum=0.010;SOR=2.077 GT:AD:DP:GQ:PL  0/1:54,9:63:71:71,0,1649

Need your suggestion. Thank you.

Francesco Favero

unread,
Feb 2, 2020, 4:36:59 AM2/2/20
to rkendar, Sequenza User Group
Hi,

Mutect2 offer very few data to have a reliable copy numbercalls, but if you use also the haplotype caller you can basically have a similar result then if you run bam2seqz. You should run it in a way to have both samples in the vcf.
The mutect preset is a bug I have to fix from some time, I’ll have a look next week about the haplotype caller If need a preset or it may work with the current options in snp2seqz.

I’ll probably change the name of snp2seqz into vcf2seqz :)

Best

Francesco



Inviato da iPhone

Il giorno 1 feb 2020, alle ore 13:15, rkendar <r.kend...@gmail.com> ha scritto:


--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/9a025986-a548-4ae4-936a-b5177103584b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages