sequenza-util snp2seqz error with mutect maf

Lihua Zou

unread,

Feb 13, 2020, 10:07:48 PM2/13/20

to Sequenza User Group

I have a VCF maf based on mutect. I check the vcf.py code and sequenza doens't seem able to handle this format yet only strelka and caveman.

Could you verify this? All my VCFs are based on mutect2 so I wonder if there is a fix to get around this since the difference between mutect and strelka/caveman are not that huge? Thanks!

sequenza-utils snp2seqz -v f6e2ea5a-0970-4e02-b11e-d83180dddf55.mutect2_somatic.PASS.vep.vcf -gc gc50.hg38.txt.gz --preset mutect -o test.seqz.gz

Traceback (most recent call last):

File "/home/lze6063/.conda/envs/sequenza/bin/sequenza-utils", line 11, in <module>

sys.exit(main())

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/commands.py", line 39, in main

modules[args.module](subparsers, args.module, extra, log)

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/programs/snp2seqz.py", line 98, in snp2seqz

for vcf_line in seqz_vcf:

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 57, in vfc2seqz

for line in vcf_gc:

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/izip.py", line 30, in next

self.c1_line = next(self.c1)

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 130, in vcf_parse

alleles[0], ref_alt, ':', ',', preset)

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 157, in split_format

alleles, ref_alt, ':', ',', None)

File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 139, in split_format

format_str = format_str.split(split_char1)

AttributeError: 'list' object has no attribute 'split'

Francesco Favero

unread,

Feb 14, 2020, 3:59:05 AM2/14/20

to Lihua Zou, Sequenza User Group

Hi Lihua,

sorry for the problem, in reality it’s a known issue, for mutec/mutect2 you should run the program without preset. it should work just out of the box.

I’ll fix the problem in the next release.

Best

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/f2758708-2004-49e9-abd8-3ee3c0b0791f%40googlegroups.com.

Lihua Zou

unread,

Feb 14, 2020, 10:01:20 AM2/14/20

to Francesco Favero, Sequenza User Group

Dear Francesco,

Thank you for your email.

I tried without preset as well. The snp2seqz step can finish but the sequenze.extract will fail with resulting seqz.gz file. I copied the error

below:

sequenze Error in data.frame(base.count = as.integer(n.base.mut), maj.base.freq = as.numeric(max.freqs[, : arguments imply differing number of rows: 263, 250