-1 quality score for all site in vcf

109 views
Skip to first unread message

Emanuele Berrilli

unread,
Jan 26, 2025, 6:14:12 PMJan 26
to Stacks

Dears,

I am currently running the ref_map pipeline, and after processing my data through populations, I noticed an issue with the resulting VCF file. When inspecting the quality scores (QUAL column), I found that all sites have a value of -1. This result seems unusual, and I am not entirely sure what could be causing it. I would greatly appreciate your insight into what might be responsible for this and any recommendations you might have to resolve the issue.

Thank you very much in advance for your time and support!


CHROM     POS  QUAL 
OZ076838.1 64555 -1 
OZ076838.1 118964 -1 
OZ076838.1 157517 -1 
OZ076838.1 165755 -1  


Emanuele 

Catchen, Julian

unread,
Jan 26, 2025, 6:59:20 PMJan 26
to stacks...@googlegroups.com

Hi Emanuele,

 

It is not clear what Stacks output file you are referencing in your message. Stacks will output VCF files, of course, but they would have columns such as

 

CHROM  POS     ID      REF     ALT     QUAL    FILTER...

 

And the software does not populate the QUAL field (it would be a “.”). If you want more troubleshooting, we would need to know what version of Stacks  you ran and what commands you executed? What sort of data are you processing?

 

Best,


Julian

Emanuele Berrilli

unread,
Jan 27, 2025, 6:15:45 PMJan 27
to stacks...@googlegroups.com
Hi Julia,
yes sorry if I didn't explain well. So, I first index and alignm my reads (I'm working with ddRAD data) to a reference genome using bwa index and bwa mem. Then I used this script: 

ref_map.pl -T 12 -o . --popmap ../popmap/pop_tot_loc --samples ../alignment/aligned_reads/

populations -P ../../stacks/ -M ../../popmap/pop_tot_loc -O . -R 0.8 --write-single-snp --vcf --plink

After that, I check the VCF file quality with:

vcftools --gzvcf populations.snps.vcf --freq2 --out $OUT --max-alleles 2 |
vcftools --gzvcf populations.snps.vcf --depth --out $OUT |
vcftools --gzvcf populations.snps.vcf --site-mean-depth --out $OUT |
vcftools --gzvcf populations.snps.vcf --site-quality --out $OUT |
vcftools --gzvcf populations.snps.vcf --missing-indv --out $OUT |
vcftools --gzvcf populations.snps.vcf --missing-site --out $OUT |
vcftools --gzvcf populations.snps.vcf --het --out $OUT

When I try to filter using vcftools --vcf input.vcf --maf 0.05 --max-missing 0.8 --minQ 30 --recode --out filtered_combined, I end up with an "empty" VCF file as output (with the message: "After filtering, kept 0 out of a possible 11,532 sites"). The issue seems to be related to the use of the --minQ parameter. I believe this is because the quality scores (which I checked with vcftools --gzvcf populations.snps.vcf --site-quality --out $OUT) are set to -1 for all sites, as I mentioned in my previous email.  
Tell me if I can give you more information and thanks in advance for your helo.

Best,

Emanuele 

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/stacks-users/PH7PR11MB672203950736D12547F7B525A7ED2%40PH7PR11MB6722.namprd11.prod.outlook.com.


--
Emanuele Berrilli, Phd 
PostDoc at Department of Life, Health and Environmental Sciences
University of L'Aquila
Via Vetoio, 40
67100 Coppito AQ

Catchen, Julian

unread,
Jan 27, 2025, 6:20:37 PMJan 27
to stacks...@googlegroups.com

Hi Emanuele,

 

I can’t help you with your execution of vcftools (that said, it does not make sense to run vcftools 7 times, connected by UNIX pipes, but to supply the same, input file, --gzvcf populations.snps.vcf, to each stage). If you can show me that the populations program output a VCF file with -1 for the quality parameter, I could look for software bugs, however, if vcftools is making the conversion, then you need to modify your use of that program.

Angel Rivera-Colón

unread,
Jan 30, 2025, 6:20:17 PMJan 30
to Stacks
Hi Emanuele,

Expanding on what Julian said above. Several of these same statistics are present in the populations.log.distribs file, which is generated by default. You probably don't need to re-run VCF tools for those.

Regarding the QUAL filter, Stacks does not report this value explicitly in the VCF (it is present as a "." in the file, as specified by the VCF standard). The sites are then filtered by VCFtools when you try to apply the filter (the "." is internally processed as a missing value of "-1", for some reason). Removing the filter should solve this issue. Stacks does include other quality information, like the per-individual genotype quality (GQ). That could be an useful filter instead.

One note on these filters is that some of the general quality filters for variant sites and genotypes are already processed internally in gstacks. The values for QUAL and GQ are related to the var-alpha and gt-alpha present in the genotyping model in Stacks. QUAL and the var-alpha control how the call is being made by site (i.e., confidence of seeing the site as variant), while GQ and gt-alpha refer to the calling of the individual's genotype. In other words, Stacks is already performing a baseline filter on these low quality positions, both at the variant site and genotype level.

Hope this information is helpful,
Angel

Emanuele Berrilli

unread,
Feb 9, 2025, 11:03:01 AMFeb 9
to Stacks

Dear Angel,

Thank you very much for your explanation and your help! I was asking about this because I had seen in some papers that part of the filtering on the VCF was performed after populations, so the presence of the "." value in the QUAL column had surprised me. In any case, I will follow your advice regarding the alternatives to assess the quality of the VCF and filter it accordingly.

Many thanks for your help!

Best


Emanuele


Reply all
Reply to author
Forward
0 new messages