gstacks run time?

175 views
Skip to first unread message

KV

unread,
Feb 2, 2022, 1:27:59 PM2/2/22
to Stacks
Hi all,
I'm using gstacks v2.52 with GBS data representing 96 genotypes. I designated 12 CPUs, and it looks like only 6 are being used, so I guess I could've set that flag lower. I am curious about the run time I should be expecting here. Gstacks has been running for 6 days so far, and the log reports 200000 loci processed so far. The catalog.calls file is at 290G. Is this an unusually long run time? I'm doing this for a manuscript revision with a deadline, and am hoping the run successfully completes soon! Thanks for any insights.

Catchen, Julian

unread,
Feb 3, 2022, 5:09:01 AM2/3/22
to stacks...@googlegroups.com

Hi, what you describe is not a typical gstacks run, however, without any information about the type of analysis it is difficult to say more. I can say 290Gb for a catalog.calls file is pretty huge (particularly for 96 samples?… not sure if you meant “genotypes”) and that if you specify 12 threads, it will use 12 threads (or more if you give it). Six days is a long time for it to run.

Kelly Vining

unread,
Feb 3, 2022, 11:26:51 AM2/3/22
to stacks...@googlegroups.com
Hi Julian,
Thank you for your prompt response - much appreciated!  Yes, I did say in the initial post that there are 96 genotypes in this population. I am relieved to report that gstacks completed overnight. The final size of the catalog.calls file is 311G. A few years ago, I processed the same data set, aligned to a fragmented earlier draft of the reference genome I'm using now, and the run time was definitely not a week. I'm pasting the end of the log below in case it provides any information that could point to the cause of the long run time. Maybe it'll be helpful to anyone else who encounters this issue in the future.

#####################
Read 1590131406 BAM records:
  kept 1061536796 primary alignments (68.2%), of which 521144188 reverse reads
  skipped 370417367 primary alignments with insufficient mapping qualities (23.8%)
  skipped 94457451 excessively soft-clipped primary alignments (6.1%)
  skipped 29947200 unmapped reads (1.9%)
  skipped some suboptimal (secondary/supplementary) alignment records

  Per-sample stats (details in 'gstacks.log.distribs'):
    read 16563868.8 records/sample (7310576-25045255)
    kept 62.0%-71.2% of these

Built 307023058 loci comprising 540392608 forward reads and 490961511 matching paired-end reads; mean insert length was 328.9 (sd: 106.5).
Removed 49431097 unpaired (forward) reads (9.1%); kept 490961511 read pairs in 289031677 loci.
Removed 52790974 read pairs whose insert length had already been seen in the same sample as putative PCR duplicates (10.8%); kept 438170537 read pairs.

Genotyped 289031677 loci:
  effective per-sample coverage: mean=1.9x, stdev=0.2x, min=1.3x, max=2.4x
  mean number of sites per locus: 297.5
  a consistent phasing was found for 809596 of out 812948 (99.6%) diploid loci needing phasing


--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/s_GljYKlUsg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/SN6PR11MB2557FC20447D687B692E8801A7289%40SN6PR11MB2557.namprd11.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages