Stacks2 reference based paired-end alignments

Carol

unread,

Oct 30, 2017, 4:18:16 AM10/30/17

to Stacks

Hello stacks users,

I am testing STACKS Beta2 with my 339 samples (paired-end alignments)

I have already run gstacks without novelty.

/home/buitracn/RADseq/tools/stacks-2.0Beta2/gstacks --paired -I ./bam-files/ -M spis-popmap-339samples -O ./gstacks-339samples -t 40 &>> ./gstacks-339samples/log.gstacks.txt

Yet, when I run the population module as I used to before (and without any bootstrapping) is taking very long (>2 days). With STACKS v1.46 this usually would take only couple of hours. Is this normal??

/home/buitracn/RADseq/tools/stacks-2.0Beta2/populations -P ./gstacks-339samples -M ./spis-popmap-339samples -O ./gstacks-339samples/test-m3-p6-r50-339samples -p 6 -r 0.5 -m 3 --min_maf 0.05 -e pstI --merge_sites --fstats --fst_correction p_value --verbose --vcf --fasta_strict --vcf_haplotypes --genepop --structure --log_fst_comp -t 40 &>> ./gstacks-339samples/test-m3-p6-r50-339samples/Log-pop-m3-p6-r50_Spis339samples-BETA2.txt

Additionally, I have noticed that some files that used to be outputted before are missing

batch_1.fst_summary.tsv

batch_1.sumstats_summary.tsv

Are this going to be added in the next release?? I think those where very informative an important to have an overview of the differentiation between populations.

Cheers,

Carol

Carol Buitrago

unread,

Oct 30, 2017, 2:15:14 PM10/30/17

to stacks...@googlegroups.com

Adding to my previous message I have to say that the population module of STACKS Beta2 after running for almost 2 days did not complete the analysis and crashed with the message:

Segmentation fault (core dumped)

I followed the progress and memory consumption over time, which did not reach 4 G. This process was run in a cluster with 500G of memory.

Additionally, I checked the log file and found out that the populations module stopped at batch 2480 (I'm not sure how many batches do I have in my data, how can I find out?)

Cheers

Carol

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/X1euu_8Q63c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

Nicolas Rochette

unread,

Oct 30, 2017, 3:10:29 PM10/30/17

to Stacks

Hi Carol,

Could you attach/provide us with the log file?

Lower memory consumption is normal, and the files `batch_1.fst_summary.tsv` and `batch_1.sumstats_summary.tsv` are missing because they are written at the end of the run, when all loci have been processed.

Best,

Nicolas

You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.

Julian Catchen

unread,

Oct 30, 2017, 11:00:04 PM10/30/17

to stacks...@googlegroups.com, carol.b...@gmail.com

Hi Carol,

While we look for the bug you are seeing, you can try running
populations with a larger batch size. By default, it loads 10k loci at a
time, but you can increase this using the --batch_size XX switch since
you have more memory. You could probably put it up to 100k or even more.

We are looking for feedback on performance issues.

julian

Nicolas Rochette wrote:
> Hi Carol,
>
> Could you attach/provide us with the log file?
>
> Lower memory consumption is normal, and the files
> `batch_1.fst_summary.tsv` and `batch_1.sumstats_summary.tsv` are missing
> because they are written at the end of the run, when all loci have been
> processed.
>
> Best,
>
> Nicolas
>
> Carol Buitrago wrote on 10/30/2017 01:15 PM:
>
>> Adding to my previous message I have to say that the population module
>> of STACKS Beta2 after running for almost 2 days did not complete the
>> analysis and crashed with the message:
>> Segmentation fault (core dumped)
>>
>> I followed the progress and memory consumption over time, which did
>> not reach 4 G. This process was run in a cluster with 500G of memory.
>>
>> Additionally, I checked the log file and found out that the
>> populations module stopped at batch 2480 (I'm not sure how many
>> batches do I have in my data, how can I find out?)
>>
>>
>> Cheers
>>
>> Carol
>>
>> 2017-10-30 11:18 GMT+03:00 Carol <carol.b...@gmail.com

>> <mailto:carol.b...@gmail.com>>:

Carol Buitrago

unread,

Oct 31, 2017, 2:24:19 AM10/31/17

to Julian Catchen, stacks...@googlegroups.com

Hi Julian and Nicolas,

Thanks for the quick reply.

I'll re-run the populations module again as you suggested Julian. Yet, when the run crashes the error in the screen is "Segmentation fault (core dumped)", which according to my searching is due to memory shortage. So I'm not sure if by increasing the --batch_size it will improve (anyways I'll give a try and let you know about my results).

Attached you can find the "populations.log" and the screen log "Log-pop-m3-p6-r50_Spis339samples-BETA2.txt" files. In addition, I'm also attaching the gstacks.log file

Could you please let me know how can I find out how many "batches" do I have? are batches the same as RADloci??

Why does the populations module in this version take so long as compared to the version 1 (1.46), is it maybe cause the RADloci are longer (pair-end contigs)??

looking forward to hearing from you,

Cheers,

Carol

populations.log

gstacks.log

Log-pop-m3-p6-r50_Spis339samples-BETA2.txt

<mailto:carol.buitragol@gmail.com>>:

Nicolas Rochette

unread,

Oct 31, 2017, 12:58:48 PM10/31/17

to Stacks

Hi Carol,

Thank you for the files, we're going to look into the bug. "Segmentation fault" just means that the program has tried to access an area of the memory that according the the operating system doesn't belong to it. It is a quite general error but it could be, for instance, trying to access an element in an array at an index past the end.

Best,

Nicolas

Carol Buitrago

unread,

Nov 1, 2017, 7:51:11 AM11/1/17

to stacks...@googlegroups.com

Dear Julian and Nicolas

I ran the populations module as suggested by Julian with a larger --batch_size (see command line below) but the process didn't improve in speed and it was crashed again at "batch 2480" with the same error "Segmentation fault (core dumped)".

/home/buitracn/RADseq/tools/stacks-2.0Beta2/populations -P ./gstacks-339samples -M ./spis-popmap-339samples -O ./gstacks-339samples/m3-p6-r50-339samples -p 6 -r 0.5 -m 3 --min_maf 0.05 -e pstI --merge_sites --batch_size 100000 --fstats --fst_correction p_value --verbose --vcf --fasta_strict --vcf_haplotypes --genepop --structure --log_fst_comp -t 40 &>> ./gstacks-339samples/m3-p6-r50-339samples/Log-pop-m3-p6-r50_Spis339samples-BETA2.txt

The memory consumption was never greater than 4G (attached you can find the log files)

Hope you can find the bug that crashes the populations module in STACKS beta2 version.

Please let me know if is there anything else I could do.

Cheers,

Carol

Log-pop-m3-p6-r50_Spis339samples-BETA2.txt

populations.log

To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users+unsubscribe@googlegroups.com.

Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

--

Stacks website: http://catchenlab.life.illinois.edu/stacks/
---

You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/X1euu_8Q63c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users+unsubscribe@googlegroups.com.

Julian Catchen

unread,

Nov 1, 2017, 4:01:19 PM11/1/17

to stacks...@googlegroups.com, carol.b...@gmail.com

Hi Carol,

With respect to my earlier message, I thought your data were de novo
assembled, in which case the batch size is arbitrary (small enough to
fit into memory, large enough to reduce I/O). But looking at your logs,
I see your data are reference aligned, which means you will have one
batch per chromosome (or scaffold).

Can you please try running the software without the extra parameters,
just run the program with as few options as possible (do include just -r
and -p).

Also, can you run GDB and provide us a back trace?

If you can you just prefix your existing command like,

gdb --args ~/RADseq/tools/stacks-2.0Beta2/populations -P ...

If it segfaults, GDB will stop and you can ask it for a back trace by
typing 'bt' at the prompt (and 'quit' to exit GDB).

This will tell us where the code is crashing and give us some hints as
to where to look.

If you could copy/paste this stack trace into an email it would be good.

Your data set is large and we are not seeing anything else similar on
our data sets.

Best,

julian

Carol Buitrago wrote:
> Dear Julian and Nicolas
>
> I ran the populations module as suggested by Julian with a larger
> --batch_size (see command line below) but the process didn't improve in
> speed and it was crashed again at "batch 2480" with the same error
> "Segmentation fault (core dumped)".
>
> /home/buitracn/RADseq/tools/stacks-2.0Beta2/populations -P
> ./gstacks-339samples-M ./spis-popmap-339samples-O
> ./gstacks-339samples/m3-p6-r50-339samples -p 6 -r 0.5 -m 3 --min_maf
> 0.05 -e pstI --merge_sites --batch_size 100000 --fstats --fst_correction
> p_value --verbose --vcf --fasta_strict --vcf_haplotypes --genepop
> --structure --log_fst_comp -t 40 &>>
> ./gstacks-339samples/m3-p6-r50-339samples/Log-pop-m3-p6-r50_Spis339samples-BETA2.txt
>
> The memory consumption was never greater than 4G (attached you can find
> the log files)
>
> Hope you can find the bug that crashes the populations module in STACKS
> beta2 version.
>
> Please let me know if is there anything else I could do.
>
> Cheers,
>
> Carol
>

> Log-pop-m3-p6-r50_Spis339samples-BETA2.txt

>
> populations.log
>
>
>
>
>
> 2017-10-31 19:58 GMT+03:00 Nicolas Rochette <roch...@illinois.edu

> <mailto:roch...@illinois.edu>>:

>> populations.log
>>
>> gstacks.log

>>
>> Log-pop-m3-p6-r50_Spis339samples-BETA2.txt
>>
>>
>>
>> 2017-10-31 5:59 GMT+03:00 Julian Catchen <jcat...@illinois.edu

>> <mailto:jcat...@illinois.edu>>:

>> <mailto:carol.b...@gmail.com>
>> <mailto:carol.b...@gmail.com
>> <mailto:carol.b...@gmail.com>>>:

Reply all

Reply to author

Forward