Investing in new server for STACKS analyses. More memory or threads?

182 views
Skip to first unread message

newbie

unread,
Jan 6, 2015, 3:46:18 AM1/6/15
to stacks...@googlegroups.com
Dear Julian,

I am considering buying a server for STACKS (and other related analyses). How does computation time scale with the number of threads/RAM?

Best regards

Julian Catchen

unread,
Jan 6, 2015, 10:45:46 PM1/6/15
to stacks...@googlegroups.com, pals...@gmail.com
Hi,

Stacks will take advantage of extra processors, particularly in ustacks, and cstacks. The populations program spends a lot of time reading from disk, but if you are doing kernel-smoothing or bootstrapping, the extra processors/cores will give you a large speed-up.

As for memory, more memory will allow larger numbers of raw reads in ustacks, and it will allow more loci in a populations analysis. However, if you use even modest filters in the populations program you will generally reduce the number of loci to a number close to the true number of RAD loci in your organism, and hence the amount of memory you will need is naturally limited by the number of loci in the population you are studying. Similarly, the amount of memory needed would also be limited in ustacks by the number or raw reads you need to commit to a single sample.

If I were building a server to do Stacks work I would want at least 64Gb or memory, and preferably 128Gb just so there would not be anything to worry about and since memory is quite cheap these days.

I know a lot of other people on the list have experience here, so please chime in.

Best,

julian

Eric Normandeau

unread,
Jan 7, 2015, 10:23:58 AM1/7/15
to stacks...@googlegroups.com, pals...@gmail.com, jcat...@illinois.edu
Hi,

We ran a few projects with up to 800 individuals each with ~4 million reads each split in 20 populations. Having 16 processors is good middle ground and we finished analyses in 1-2 weeks. Of course, this is NOT REALISTIC if you are doing this for the first time. You will need to spend a lot of time with your data before STACKS and with your results after STACKS and probably do quite a few different runs to test different parameters).

As for the memory, 64 Go is not always enough for this size of projects. We used up to 800Go but that was while using no filtering for test purposes. As Julian mentions, filtering is very important to reduce the memory footprint. 128Go should be plenty for most projects, but is is not impossible to go beyond it. Again, that would be on pretty big projects (~1000 eucaryote individuals or more, or 1 billion reads or more).

Hope this helps!

Eric

newbie

unread,
Jan 7, 2015, 11:49:32 AM1/7/15
to stacks...@googlegroups.com, pals...@gmail.com, jcat...@illinois.edu
Dear Eric,

Thank You... We are gearing up for 2000+ individuals (mammal) so the 768 GB RAM with 60 cores we are looking at seems what we should aim for.

Thanks

Eric Normandeau

unread,
Jan 7, 2015, 11:54:23 AM1/7/15
to stacks...@googlegroups.com, pals...@gmail.com, jcat...@illinois.edu
Your server seems plenty. I hope you are also going to use that server for LOTS of other projects, because it seems to be an overkill for just one STACKS project :)

You probably have this thought out too, but you may want to have around 2-4 Go of disk space per project. However, if you have multiple project, do not multiply the number of projects by 4Go. That may be too much. Again, that depends on how many parallel analyses you want to perform while still keeping the intermediary results/files.
Reply all
Reply to author
Forward
0 new messages