sequenza-util snp2seqz error with mutect maf

213 views
Skip to first unread message

Lihua Zou

unread,
Feb 13, 2020, 10:07:48 PM2/13/20
to Sequenza User Group
I have a VCF maf based on mutect.  I check the vcf.py code and sequenza doens't seem able to handle this format yet only strelka and caveman.

Could you verify this? All my VCFs are based on mutect2 so I wonder if there is a fix to get around this since the difference between mutect and strelka/caveman are not that huge? Thanks!

sequenza-utils snp2seqz -v f6e2ea5a-0970-4e02-b11e-d83180dddf55.mutect2_somatic.PASS.vep.vcf -gc gc50.hg38.txt.gz --preset mutect -o test.seqz.gz
Traceback (most recent call last):
  File "/home/lze6063/.conda/envs/sequenza/bin/sequenza-utils", line 11, in <module>
    sys.exit(main())
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/commands.py", line 39, in main
    modules[args.module](subparsers, args.module, extra, log)
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/programs/snp2seqz.py", line 98, in snp2seqz
    for vcf_line in seqz_vcf:
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 57, in vfc2seqz
    for line in vcf_gc:
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/izip.py", line 30, in next
    self.c1_line = next(self.c1)
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 130, in vcf_parse
    alleles[0], ref_alt, ':', ',', preset)
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 157, in split_format
    alleles, ref_alt, ':', ',', None)
  File "/home/lze6063/.conda/envs/sequenza/lib/python3.7/site-packages/sequenza/vcf.py", line 139, in split_format
    format_str = format_str.split(split_char1)
AttributeError: 'list' object has no attribute 'split'


Francesco Favero

unread,
Feb 14, 2020, 3:59:05 AM2/14/20
to Lihua Zou, Sequenza User Group
Hi Lihua,

sorry for the problem, in reality it’s a known issue, for mutec/mutect2 you should run the program without preset. it should work just out of the box.

I’ll fix the problem in the next release.

Best

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequenza-user-group/f2758708-2004-49e9-abd8-3ee3c0b0791f%40googlegroups.com.

Lihua Zou

unread,
Feb 14, 2020, 10:01:20 AM2/14/20
to Francesco Favero, Sequenza User Group
Dear Francesco,

Thank you for your email.

I tried without preset as well. The snp2seqz step can finish but the sequenze.extract will fail with resulting seqz.gz file. I copied the error 
below:

sequenze Error in data.frame(base.count = as.integer(n.base.mut), maj.base.freq = as.numeric(max.freqs[, : arguments imply differing number of rows: 263, 250

It seems there is some inconsistence in the parsing the vcf format.

Please let me know your thoughts. Thanks again!



Reply all
Reply to author
Forward
0 new messages