Populations memory requirements

Anders Goncalves da Silva

unread,

May 20, 2018, 11:52:53 PM5/20/18

to Stacks

Hi.

I have a catalog with almost 600 individuals across 85 populations (total ~150GB). How much RAM will I roughly need to run populations? Does it depend on the number of threads I set? Is there a good rule of thumb?

So far my HPC request of 128GB has failed. I have been incrementally trying larger values (up to 512GB) with relatively small walltime requirements to see if it would be sufficient. But, loading of the data to RAM has been slow and now I am running out of walltime before it has finished loading up the data.

Any assistance would be greatly appreciated.

Thank you.

Anders.

Nicolas Rochette

unread,

May 22, 2018, 3:51:04 PM5/22/18

to Stacks

Hi Anders,

Which version do you use?

Best,

Nicolas

Anders Goncalves da Silva

unread,

May 22, 2018, 7:48:25 PM5/22/18

to Stacks

Thank you, Nicolas.

The latest version: 2.0b.

Anders.

Nicolas Rochette

unread,

May 23, 2018, 10:55:48 AM5/23/18

to Stacks

Hi Anders,

Could you show your populations.log?

Best,

Nicolas

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

Julian Catchen

unread,

May 23, 2018, 11:16:18 AM5/23/18

to stacks...@googlegroups.com, Anders Goncalves da Silva

Hi Anders,

In Stacks 2.0 populations uses a streaming model (processes data in
batches), so it never 'finishes' loading the data until the program
completes. With some technical exceptions, adding more memory to a
process will not speed it up (it will allow you to store bigger things).

You don't indicate the process is being killed due to a lack of memory,
but just not having enough cpu time (walltime) to finish. What sort of
walltime are you specifying, and why don't you just increase it?

You can control memory usage in populations by changing the batch size
(10k loci by default), if you set it smaller, it will process more,
smaller batches, and if you set it larger, it will process fewer, larger
batches. Anyway, with this new model the program is not generally
memory-bound, so I think you have already specified more than you will need.

julian

Anders Goncalves da Silva

unread,

May 23, 2018, 8:29:30 PM5/23/18

to Stacks

Dear Nicolas and Julian.

Thank you for your replies and willingness to help. I am sorry for not giving all the details in the first question. Here are some clarifying points:

1. I am going through the "reference" based approach. My draft genome is from the same species as my samples and has about 10K scaffolds.

2. I was experimenting with different memory requirements before starting a long job (I didn't want to set a long job, wait in the queue for 2 days and then have my job die within 20 minutes of starting). Successfully running just 35 samples from two populations gave me some idea, but when trying to run the full dataset I have encountered problems. So, my jobs had short walltime in order to get launched fast. I monitored memory usage, and almost always I got shut down because of insufficient memory. The times my jobs got canceled because of insufficient walltime was probably because I did not request enough time. I have been monitoring the memory and CPU usage of each job, and in every case, it kept continuously increasing until time runs out or memory runs out.

3. In every case, even though I requested multiple threads with the -t option, I noticed that only a single CPU was being used. My best guess is that it is never really getting past a phase of loading stuff into memory.

4. In these tests, the log file usually gets stucks at the following line (although I did get it to successfully conclude with the small test set of 2 populations and 35 individuals): "Now processing...". The one time I got past this line with the full dataset was by requesting 1TB of RAM and 4 hours (The job was canceled because it ran out of time, and it had completed the 1 scaffold).

5. I have tried multiple combinations of threads (1,2,8,24) and batch_size (1,10,1000) (but, I am not sure what this means when using the reference-based approach: does it mean the number of chr/scaffolds?)

Below is an excerpt from populations.log file from my latest attempt (I requested 16GB of RAM and got shut down because of insufficient memory):

populations v2.0b, executed 2018-05-23 16:50:33

populations -O populations1 -P resources/gstacks -M resources/popmap.tsv -t 8 -r 0.65 --renz SbfI --merge_sites --hwe --fstats -k --bootstrap --fasta_samples --batch_size 1 --verbose
Locus/sample distributions will be written to 'populations1/populations.log.distribs'.
populations parameters selected:
Percent samples limit per population: 0.65
Locus Population limit: 1
Log liklihood filtering: off; threshold: 0
Minor allele frequency cutoff: 0
Maximum observed heterozygosity cutoff: 1
Applying Fst correction: none.
Pi/Fis kernel smoothing: on
Fstats kernel smoothing: on
Bootstrap resampling: on, exact; 100 reptitions

Parsing population map...
The population map contained 548 samples, 85 population(s), 36 group(s).
Working on 548 samples.
Working on 85 population(s):
...
Processing data in batches:
* load a batch of catalog loci and apply filters
* compute SNP- and haplotype-wise per-population statistics
* compute SNP- and haplotype-wise deviation from HWE
* compute F-statistics
* smooth per-population statistics
* smooth F-statistics
* write the above statistics in the output files
* export the genotypes/haplotypes in specified format(s)
More details in 'populations1/populations.log.distribs'.
Now processing...

<END OF FILE>

Any help would be greatly appreciated.

Thank you again.

Anders.

Anders Goncalves da Silva

unread,

May 26, 2018, 11:11:23 PM5/26/18

to Stacks

Hi.

Just an update.

I have now tried running the same command in the log below requesting 1TB of RAM and 2 days of compute. The job was killed just shy of completing one day of computation because of it was out of memory. The log seems to indicate it was working on the first scaffold.

Again, any assistance would be very helpful. Could there be an issue with the catalog file? Should I try to regenerate it?

Anders.

Nicolas Rochette

unread,

May 28, 2018, 6:54:15 PM5/28/18

to Stacks

Hi Anders,

I think there is some confusion regarding the filtering of your data. The -r parameter sets a per-population minimum, but typically you also need to specify a value for -p, or any locus present in 65% of individuals in any single population will be retained; in particular if there are populations comprising a single individual, all loci present in those individuals will always be retained unless you specify a value for -p.

Also --batch-size has no effect in reference-based mode; in this case populations processes one chromosome/scaffold at a time.

Best,

Nicolas

--

Anders Goncalves da Silva

unread,

Jun 5, 2018, 6:11:59 PM6/5/18

to Stacks

Thank you Nicolas.

Sorry for the delayed reply, our HPC was down for maintenance all of last week.

I am now experimenting with a smaller dataset (a 3 to 5 individuals per region --- 36 regions in total covering 175 individuals). I have added the "-p 1.0" flag to the call:

populations -P est_5inds_short -M test_5inds_short/popmap_test5inds.tsv -t 32 -p 1.0 -r 0.65 --hwe --fstats -k --bootstrap --fasta_samples --genepop

I requested 32 cores and 128GB of RAM. It crashes out after 2 hours with an out of memory error.

At least it seems to finish the first scaffold and it is working on the second one.

The catalog.calls is about 50GB and catalog.fa.gz is about 1.3GB.

I do have a combination of paired and single end libraries --- I don't know if that is relevant.

If I request anything less than 80GB of RAM it crashes out pretty fast before doing any computations with this iteration of the dataset.

I appreciate all the help so far. I am still hoping I can get passed this...

Best.

Anders.

Julian Catchen

unread,

Jun 6, 2018, 5:49:10 PM6/6/18

to stacks...@googlegroups.com, Anders Goncalves da Silva

Hi Anders,

Try running your data without the Hardy-Weinberg calculations. I haven't
tested this with really large data sets. Also, --bootstrap is not
implemented at the moment, so I would also remove that.

Anders Goncalves da Silva

unread,

Jun 7, 2018, 3:31:25 PM6/7/18

to Stacks

Hi Julian.

Thank you for the continued assistance. I have now tried it with the following command:

populations -P /scratch/andersgs/test_5inds_short -M /scratch/andersgs/test_5inds_short/popmap_test5inds.tsv -t 32 -p 1.0 -r 0.65 --fstats --fasta_samples --genepop

It crashed after about 40 minutes instead of 1 hour. Seems to crash at the point though. It finishes up the first scaffold and while working on the second one there seems to be a spike in memory usage. Not sure if right at the start or during or at the end. Will try without --fstats now.

The job has 185GB of RAM available and 32 cores.

Best. Anders.

Anders Goncalves da Silva

unread,

Jun 8, 2018, 1:16:06 AM6/8/18

to Stacks

Hi Julian and Nicolas.

Removing "--fstats" allowed me to move forward a bit. It ran for almost 5 hours and got pretty close to the end, but again ran out of memory (185GB requested).

I was able to monitor most of the run with htop, and memory just kept creeping up. Even though I had requested 32 CPUs, they were only used for a fraction of second just before ending a scaffold. I am assuming the parallel part of populations is focused on the calculations of the statistics and is not that important when parsing/filtering the loci.

As a feature request for the future: It would be useful to have checkpoints at this stage. So I could re-start at the last completed scaffold. Along these lines, it would be useful from an HPC perspective to be able to isolate the computations of each scaffold into separate/individual jobs, and then merge them in the end (a map/reduce approach).

In any case, removing the calculation of all stats seems to allow me to move forward. I haven't added the different stats individually to see if the issue is only with --fstats or with all the stats.