gstacks

439 views
Skip to first unread message

Bryson Sjodin

unread,
Jun 6, 2018, 4:29:35 PM6/6/18
to Stacks
Hello,

I was hoping you could answer some questions concerning gstacks. I am currently running a large ddRAD dataset (~700 individuals) through gstacks, and because of the size I am only able to run half of my samples at a time. Is there a way to merge these two gstacks runs so that I can run them through populations together? My main concern is that the loci IDs in the catalogs will be different, so I want to avoid accidentally including a locus twice in my final file. I was thinking of creating a whitelist from one half of the dataset and using that for the populations run for the second dataset, but again, the loci names will likely be different. Any thoughts or suggestions are appreciated.

Thanks,
Bryson Sjodin

Julian Catchen

unread,
Jun 7, 2018, 3:15:09 PM6/7/18
to Bryson Sjodin, stacks...@googlegroups.com
Hi Bryson,

It is not currently possible to merge separate gstacks runs. Why does
your data set not "fit"? Are you running out of memory, what size
machine is it? What is your command, etc.?

julian

Bryson Sjodin

unread,
Jun 8, 2018, 3:49:10 PM6/8/18
to Stacks
Hi Julian,

I seem to run out of memory when processing the full dataset, it will run for days on end with no indication of progress (usually stuck at 5000k loci). I am running it on a 12-core Linux with 64GB of RAM. My command is as follows:

gstacks -I /mnt/Promise_RAID/Data/Rattus_HaidaGwaii/aligned/ -M all_rats.txt -O ./ --unpaired -t 24

I have also run it without multi-threading with the same results

Thanks,
Bryson

Julian Catchen

unread,
Jun 20, 2018, 3:40:12 PM6/20/18
to stacks...@googlegroups.com, Bryson Sjodin
Hi Bryson,

There is no simple answer to this problem except to suggest you use a
computer with more memory. Likely there are a few loci in your aligned
data that have extremely high coverage (say > 10000x) and getting all
the reads from this locus and from all individuals in the population
into memory at one time in 64Gb is not possible. You could try to
identify these loci with samtools and remove them from the data or other
awkward approaches.

julian

Bryson Sjodin wrote on 6/8/18 2:49 PM:

wesley larson

unread,
Jul 12, 2018, 12:10:46 PM7/12/18
to Stacks
Hi Julian,

We have run into the same problem and are moving to analyzing our data on a cluster. However, the ability to reliably genotype individuals at different times with the same catalog is something that is very important for us. For example, we used the same catalog to make two different linkage maps for sockeye salmon and to analyze two different datasets from wild populations. We are worried about not being able to easily compare data from different stacks runs that use the same catalog. Do you anticipate trying to add this functionality into stacks 2 in the future.

Thanks,
Wes Larson  
Reply all
Reply to author
Forward
0 new messages