Populations Aborted- Bad GStacks files

86 views
Skip to first unread message

Ayress Grinage

unread,
Nov 17, 2022, 2:20:50 PM11/17/22
to Stacks
Hi,

I am trying to run the ref_map.pl script but I keep getting an error with Populations. The error is as follows:

Processing data in batches:
  * load a batch of catalog loci and apply filters
  * compute SNP- and haplotype-wise per-population statistics
  * write the above statistics in the output files
  * export the genotypes/haplotypes in specified format(s)
More details in '/home/adg223/data/Sabal_minor_RAD_Oct2022/Mapped_reads_using_BWA/sorted_bam_files/stacks/populations.log.distribs'.
Now processing...
scf7180000022817
scf7180000018166
scf7180000017965
scf7180000017991
scf7180000022902
scf7180000018164
scf7180000017629
scf7180000021179
scf7180000017930
scf7180000017892
scf7180000018225
scf7180000017187
scf7180000022348
Error: Malformed genomic position 'scf7180000018368:F:6151:scf7180000019300:F:9221:+'.
Error: Locus 13713
Error: Bad GStacks files.
Aborted.

ref_map.pl: Aborted because the last command failed (1).

I used BWA mem2 to generate the input alignments.

I am also attaching the reference map log file as well as the gstacks log distribution file.

I would really appreciate any insight into this problem, thank you!
Ayress
ref_map.log
gstacks.log.distribs

Blair Flannery

unread,
Nov 30, 2022, 2:17:19 PM11/30/22
to Stacks
Hi Julian, 

I also had an aborted (core dumped) problem with Populations. I am going through your 2017 protocol using the test samples. I am performing the reference-based analysis (p.2650) with Stacks 2.62. As the protocol is for version 1, I modified it to work with the current version by sorting the .bam file and using Gstacks. I then analyzed the output from Gstacks with Populations. At first Populations appeared to be working fine until the core dumped. Can you please provide any assistance to both Ayress and myself? Below is the populations.log

$ populations -P ./ -M ../../info/popmap.test_samples.tsv -r 0.65 --vcf --genepop --fstat --smooth --hwe -t 8
Logging to './populations.log'.
Locus/sample distributions will be written to './populations.log.distribs'.
populations parameters selected:
  Percent samples limit per population: 0.65
  Locus Population limit: 1
  Percent samples overall: 0
  Minor allele frequency cutoff: 0
  Maximum observed heterozygosity cutoff: 1
  Applying Fst correction: none.
  Pi/Fis kernel smoothing: on
  F-stats kernel smoothing: on
  Bootstrap resampling: off

Parsing population map...
The population map contained 12 samples, 4 population(s), 1 group(s).
Working on 12 samples.
Working on 4 population(s):
    cs: cs_1335.01, cs_1335.13, cs_1335.15
    pcr: pcr_1193.10, pcr_1193.11, pcr_1211.04
    sj: sj_1483.06, sj_1484.07, sj_1819.36
    wc: wc_1218.04, wc_1221.01, wc_1222.02
Working on 1 group(s) of populations:
    defaultgrp: cs, pcr, sj, wc

SNPs and calls will be written in VCF format to './populations.snps.vcf'
Haplotypes will be written in VCF format to './populations.haps.vcf'
Polymorphic sites in GenePop format will be written to './populations.snps.genepop'
Polymorphic loci in GenePop format will be written to './populations.haps.genepop'
Raw haplotypes will be written to './populations.haplotypes.tsv'
Population-level summary statistics will be written to './populations.sumstats.tsv'
Population-level haplotype summary statistics will be written to './populations.hapstats.tsv'


Processing data in batches:
  * load a batch of catalog loci and apply filters
  * compute SNP- and haplotype-wise per-population statistics
  * compute SNP- and haplotype-wise deviation from HWE
    * smooth per-population statistics
  * compute F-statistics
    * smooth F-statistics

  * write the above statistics in the output files
  * export the genotypes/haplotypes in specified format(s)
More details in './populations.log.distribs'.

Now processing...
groupI
groupII
groupIII
groupIV assertion "has_lik(gt)" failed: file "src/utils.h", line 121, function: double GtLiks::at(std::size_t) const
                                                                                                                    Aborted (core dumped)

thanks,

Blair

Catchen, Julian

unread,
Nov 30, 2022, 5:09:55 PM11/30/22
to stacks...@googlegroups.com

Hi Ayress,

 

This problem, I think is coming from the IDs of your contigs from your reference genome. The stacks programs will want to use colons internally to store the contig/basepair/strand for each locus, but I think 'scf7180000018368:F’ contains a colon in it, though you would need to check the FASTA file to know for sure, which could break the expectations of the parser in populations.

 

Best,

 

julian

Blair Flannery

unread,
Dec 2, 2022, 6:02:22 PM12/2/22
to Stacks
Hi Julian, 
I found a solution in an older post. I removed the -vcf option and populations worked. Apparently, you need a lot of RAM to produce a vcf file.
thanks,
Blair

Ayress Grinage

unread,
Dec 6, 2022, 10:39:52 PM12/6/22
to Stacks
Hi Julian,

Thank you for your response!

The problem seems to have been resolved when I replaced all the scaffold names that had colons with underscores.

Thank you so much for your help!
Ayress

Reply all
Reply to author
Forward
0 new messages