snp2seqz 0 copy-number segments

145 views

Skip to first unread message

Judit Bercek

unread,

Feb 26, 2021, 6:10:31 AM2/26/21

to Sequenza User Group

Hi Francesco,

I used sequenza before on bam files and I was excited to try out the snp2seqz module. However, I have some difficulties with it.

I am trying to run sequenza-utils snp2seqz on filtered WES somatic vcf files generated by Mutect2. First, I used vcftools to filter out indels from the vcf files. Then I checked the header of my vcf files in order to decide which parameters are appropriate for my data. This is the header of my vcf file:

head(vcf)
[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"

[1] "##filtering_status=These calls have been filtered by FilterMutectCalls to label false positives with a list of failed filters and true positives with PASS."
[1] "##normal_sample=P001_normal"
[1] "##source=FilterMutectCalls"
[1] "##source=Mutect2"
[1] "##tumor_sample=P001"
[1] "##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
[1] "##SnpSiftCmd="SnpSift Filter '(GEN[0].AF >= 0.05) & (GEN[0].AD[1] >= 3)'"

[1] "##FILTER=<ID=PASS,Description=\"All filters passed\">"
[1] "##FILTER=<ID=FAIL,Description=\"Fail the site if all alleles fail but [Truncated]"
[1] "##FILTER=<ID=base_qual,Description=\"alt median base quality\">"
[1] "##FILTER=<ID=clustered_events,Description=\"Clustered events observed [Truncated]"
[1] "##FILTER=<ID=contamination,Description=\"contamination\">"
[1] "First 6 rows."
[1]
[1] "***** Fixed section *****"
     CHROM POS        ID REF ALT QUAL FILTER
[1,] "1"   "745347"   NA "T" "C" NA   "PASS"
[2,] "1"   "1225748" NA "C" "T" NA   "PASS"
[3,] "1"   "6008145" NA "G" "A" NA   "PASS"
[4,] "1"   "6259764" NA "G" "A" NA   "PASS"
[5,] "1"   "15814028" NA "C" "T" NA   "PASS"
[6,] "1"   "16270448" NA "C" "T" NA   "PASS"
[1]
[1] "***** Genotype section *****"
     FORMAT                     P001
[1,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:35,5:0.116:40:16,3:18,2:26,9,3,2"
[2,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:15,11:0.444:26:8,9:7,2:9,6,6,5"
[3,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:23,27:0.528:50:12,10:10,17:14,9,18,9"
[4,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:59,52:0.478:111:22,22:31,24:32,27,32,20"
[5,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:11,20:0.628:31:8,13:3,6:2,9,4,16"
[6,] "GT:AD:AF:DP:F1R2:F2R1:SB" "0/1:5,5:0.5:10:3,2:2,3:1,4,0,5"
     P001_normal
[1,] "0/0:17,0:0.059:17:2,0:14,0:14,3,0,0"
[2,] "0/0:19,0:0.056:19:9,0:10,0:13,6,0,0"
[3,] "0/0:27,0:0.05:27:8,0:19,0:16,11,0,0"
[4,] "0/0:73,0:0.015:73:35,0:33,0:43,30,0,0"
[5,] "0/0:13,0:0.072:13:2,0:7,0:2,11,0,0"
[6,] "0/0:9,0:0.101:9:4,0:4,0:1,8,0,0"
[1]
[1] "Unique GT formats:"
[1] "GT:AD:AF:DP:F1R2:F2R1:SB"            "GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB"
[1]

Finally, I generated seqz files from the vcf files using this command without preset:
sequenza-utils snp2seqz -o ${sequenza_folder}${file}2seqz.gz -v ${vcf_no_indels}${file} --vcf-samples t/n -gc ${gc_folder}grch37.gc50Base.txt.gz

However, when I read in the resulting seqz files and ran sequenza.extract() with default parameters, I saw that they do not have heterozygous positions.

extract <- sequenza.extract(data.file, chromosome.list=c(1:24))

Collecting GC information . done

Processing 1:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   136 homozygous positions.
Processing 2:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   138 homozygous positions.
Processing 3:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   91 homozygous positions.
Processing 4:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   75 homozygous positions.
Processing 5:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   42 homozygous positions.
Processing 6:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   103 homozygous positions.
Processing 7:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   96 homozygous positions.
Processing 8:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   69 homozygous positions.
Processing 9:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   54 homozygous positions.
Processing 10:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   40 homozygous positions.
Processing 11:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   90 homozygous positions.
Processing 12:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   70 homozygous positions.
Processing 13:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   38 homozygous positions.
Processing 14:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   46 homozygous positions.
Processing 15:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   62 homozygous positions.
Processing 16:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   50 homozygous positions.
Processing 17:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   97 homozygous positions.
Processing 18:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   30 homozygous positions.
Processing 19:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   93 homozygous positions.
Processing 20:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   51 homozygous positions.
Processing 21:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   33 homozygous positions.
Processing 22:
   1 variant calls.
   0 copy-number segments.
   0 heterozygous positions.
   36 homozygous positions.

Thus, I decided to change the --vcf-samples switch to n/t. It seemed to work, i.e., the resulting seqz files had heterozygous positions. However, the number variant calls remained 1 and the number of copy-number segments was still 0 in most cases and running sequenza.fit() gave the following error:

Error in result$LPP - max.lik : non-numeric argument to binary operator
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Could you please help me to figure out what is the problem? Am I using wrong files or parameter settings?