Stacks 2.0 Beta 1

1,215 views
Skip to first unread message

Julian Catchen

unread,
Oct 9, 2017, 3:41:12 PM10/9/17
to stacks...@googlegroups.com
Hi All,

After a year of hard work, I am pleased to release Stacks 2.0 Beta 1 today:

http://catchenlab.life.illinois.edu/stacks/stacks_v2.php

This first, major overhaul of the software includes the following changes:

1. Paired-end sequencing data can be utilized fully. In particular, when
the shearing-based original protocol of Baird et al. (2008) is used, or
one of the similar analogous protocols, the software will assemble a
local contig from the paired reads across the population, possibly
overlap it with the forward-reads region, then align all reads to the
assembled contig. This new approach also fully supports double-digest
protocols.

To do this, we have written a custom de Bruijn graph assembler to put
the contigs together from the paired-end reads and a suffix tree-based
alignment algorithm to put the reads back against the assembled contigs.

2. Haplotype calling and diploidy-violation detection now rely on a
novel, more powerful algorithm.

We are aiming to provide rich haplotypes from each fully assembled locus.

3. SNP and genotype-calling now uses the diploid models of Maruki and
Lynch (2017). This should ensure good statistical properties when the
number of samples is large or when the local coverage is low.

4. The rxstacks program has been replaced with the gstacks program, and
there is no need to re-run some of the earlier steps of the pipeline
anymore.

5. The populations program has been overhauled to simplify the logic of
the program, and improve the outputs. The memory footprint of the
populations program has been considerably reduced and can now be scaled
for any size data set.

6. The reference-based pipeline has been simplified, and now only
comprises two steps: gstacks and populations.

7. In addition, numerous interface and performance improvements have
been made. Some of these changes have been released as part of the
latest Stacks-v1 versions.

One of the major changes is that we flipped the data organization.
Whereas before data were processed per sample, now, after the samples
are put through the initial pipeline (which is unchanged from Stacks
v1), the data are reorganized by *locus*, so all reads from the
population of all samples are available at each locus. This allows us to
incorporate a population-level Bayesian component to the SNP caller, and
have all the reads in one place to assemble/align them.

There are still a number of minor things that are not complete, some
export formats are not reimplemented in populations, and some features
have not updated yet. However, the code is tested and working well on
our data sets, we are ready to open it up to a wider audience and would
be happy to receive your feedback/suggestions.

The code was designed and implemented by my postdoc, Nicolas Rochette,
and myself, with simulation and testing help from my student, Angel
Rivera-Colón.

Best,

julian

--
Julian M Catchen, Ph.D.
Assistant Professor
Department of Animal Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

Thierry Gosselin

unread,
Oct 11, 2017, 8:55:27 AM10/11/17
to Stacks
Thanks Julian, Nicolas and Angel for the hard work. 
Looks very promising and look forward using/testing the new algorithms.

I like tsv2bam output saying how many loci and percentage of ind. loci 
were found to have a one-to-one relationship with the catalog. 


Questions

1. SAMtools is terrible at using multiple processors when merging,
is there a faster way of merging hundreds of .bam files ?  

2. Sambamba is faster, but will gstacks be able to read that huge file ?

3. The resulting catalog bam file, nothing to do on that file before moving to the next gstacks module ? (e.g. sorting, indexing ?)

Best,
Thierry

Carol

unread,
Oct 11, 2017, 12:49:42 PM10/11/17
to Stacks
Thanks a lot for this new STACKS v 2 release!!!

The major changes seem pretty awesome and I'm looking forward to give it a try with my data!

However, there is an error that prevent me from moving forward with the installation (see error bellow). Could you please advise?  

In file included from src/DNANSeq.cc:33:0:
src/DNANSeq.h:121:8: error: redefinition of ‘struct std::hash<DNANSeq>’
 struct hash<DNANSeq> {
        ^
src/DNANSeq.h:108:8: error: previous definition of ‘struct std::hash<DNANSeq>’
 struct hash<DNANSeq> {
        ^
Makefile:1015: recipe for target 'src/libcore_a-DNANSeq.o' failed
make[2]: *** [src/libcore_a-DNANSeq.o] Error 1

Thierry Gosselin

unread,
Oct 11, 2017, 1:13:28 PM10/11/17
to Stacks
Hi Carol,

It's usually easier to debug when you post the command you've used, but in this case it happened to me ...

Version 2.0Beta1 doesn't work with Google’s SparseHash, 
If you have all dependencies installed you could try:

wget http://catchenlab.life.illinois.edu/stacks/source/stacks-2.0Beta1.tar.gz
tar -xvf stacks-2.0Beta1.tar.gz
cd stacks-2.0Beta1
./configure --enable-bam
make -j8 # if you have 8 CPU (to make faster...)
sudo make install # this will install the binaries in /usr/local/bin
Cheers
Thierry


Carol Buitrago

unread,
Oct 11, 2017, 1:23:26 PM10/11/17
to stacks...@googlegroups.com
Thanks Thierry,

Indeed it works when using ./configure without --enable-sparsehash 

:)

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/JAs16qz6R1I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

Nicolas Rochette

unread,
Oct 11, 2017, 1:56:10 PM10/11/17
to stacks...@googlegroups.com
Hi Thierry,

Thank you for your feedback. The merging prodecure is something we're still considering. There are multiple performance and simplicity of use questions but the current setup should already work well (especially for people who are used to using samtools) but more user input is welcome.

To answer your specific questions:

1. Merging is going to be slow regardless of what you do, because it will quickly be I/O bound. tsv2bam outputs sorted BAM files, so there should be very little computations apart for the uncompressing/recompressing operations (BAM is a gzipped format). We're open to discussion though, and we've also considered trying to not require the merge at all.

2. All the files are already sorted in our case. Sambamba may be faster for merging as well, though. Otherwise, gstacks should have no problems opening and processing a gigantic file.

3. You can use the file directly. It's sorted, and you can index it if you want to, but gstacks doesn't need or use the index (and wouldn't benefit from doing so).

Best,

Nicolas
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.

Nicolas Rochette

unread,
Oct 11, 2017, 2:06:39 PM10/11/17
to Stacks
Indeed, we don't support sparsehash anymore, as with past improvements the benefit had become quite limited (this is true for the recent v1 versions already). We're going to take --google-sparsehash and the rest of the related code out.

Best,

Nicolas
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.

Josie Jackson

unread,
Oct 31, 2017, 4:36:49 PM10/31/17
to Stacks
Hello,
I have just run denovo_map.pl using the new Stacks2.0 Beta1 it was all fine but then aborted in the stage when trying to merge BAM files with samtools. The error is: [pretty_header] invalid header. 
I have 206 samples, paired end RAD and I used the following command for the wrapper: 

denovo_map.pl -m 10 -T 15 -b 1 -S -o ./stacksB1 --samples ./plate1and2 --paired-ends -O ./popmap.txt -X populations:-r 0.90 -p 3 --fstats --fasta_strict --structure --genepop --vcf


Any ideas would be much appreciated!

Thanks,
Josie 

Nicolas Rochette

unread,
Nov 1, 2017, 11:23:50 AM11/1/17
to stacks...@googlegroups.com

Hi Josie,

Could you try to send me the SAM headers (output of `samtools view -H file.bam`)? That could give me an idea of what went wrong. Otherwise, you probably want to use the newer v2.0Beta2.

Best,

Nicolas

Josie Jackson

unread,
Nov 1, 2017, 3:03:08 PM11/1/17
to Stacks
Hi Nicolas,
Thanks very much for your response, after some trouble shooting today I realise the issue was with my samtools version (1.1) rather than the newer version 1.6. After updating that error was fixed but now I have the following error when running gstacks

gstacks
==========
/trinity/shared/apps/site-local/stacks/2.0Beta1/bin/gstacks -P ./stacksB1.1 -t 15 2>&1

Logging to './stacksB1.1/gstacks.log'.
Configuration for this run:
  Input mode: de novo
  Input file: './stacksB1.1/catalog.bam'
  Output to: './stacksB1.1/gstacks.*'
  Model: marukilow (var_alpha: 0.05, gt_alpha: 0.05)

Processing all loci...
gstacks: src/locus.cc:373: void CLocAlnSet::merge_paired_reads(): Assertion `r1->sample == r2->sample' failed.

denovo_map.pl: Aborted because the last command failed. (6)


Thanks very much in advance!
Best wishes,
Josie

Tom Brekke

unread,
Nov 9, 2017, 9:50:33 AM11/9/17
to Stacks
Hello Julian and Nicolas,

Thanks for keeping Stacks current!
I've been excited to try v2 and have been working with Beta4 on some PE GBS data.

I have two questions, first: I can't seem to find the old 'genotypes' script - is there something that replaced it? I have an F2 mapping panel and I'd love an rqtl file like the old version would make.

Second, when running gstacks I get the following error:

gstacks -P ./ -M population_map.txt

Logging to './gstacks.log'.

Configuration for this run:
  Input mode: denovo
  Population map: 'population_map.txt'
  Input files: 147, e.g. './EEEE0-1F.matches.bam'
  Output to: './gstacks.*'

  Model: marukilow (var_alpha: 0.05, gt_alpha: 0.05)

Processing all loci...
Assertion failed: (cigar_length_ref(r.cigar) == ref_.length()), function add, file src/locus.h, line 296.
Abort trap: 6

I can't figure out why it doesn't like the cigar strings. They seem fine when I look at them in the bam files (see the example below - every one is either 140M if it's a first read or 1X139S if it's a second). I thought it might be something with the paired reads, and so I re-ran tsv2bam with only the first reads, but it came up with the same error.

Here's the first couple lines from a bam:
4_21609_12160_8045      65      1       1       255     140M    *       0       -1      GTGTGCAAGTCCACAGACTCATAAGTTTCAGTGCTTCTATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACA    *       RG:Z:1
4_21410_10916_10772     65      1       1       255     140M    *       0       -1      GTGTGCAAGTCCACAGACTCATAAGTTTCAGTGCTTCTATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACA    *       RG:Z:1
4_11612_21827_13094     65      1       1       255     140M    *       0       -1      GTGTGCAAGTCCACAGACTCATAAGTTTCAGTGCTTCTATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACA    *       RG:Z:1
4_21603_4538_1634       65      1       1       255     140M    *       0       -1      GTGTGCAAGTCCACAGACTCATAAGTTTCAGTGCTTCTATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAAAAAGAGAGATCTTGTCTCAAATTAGGTACA    *       RG:Z:1
4_21410_10916_10772     145     1       1       255     1X139S  *       0       -1      TATGGGGTATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACAAGGACTGATATCCAAGGCCATCCTCTGACCTTCACAT    *       RG:Z:1
4_21603_4538_1634       145     1       1       255     1X139S  *       0       -1      TATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACAAGGACTGATATCCAAGGCCATCCTCTGACCTTCACAT    *       RG:Z:1
4_21609_12160_8045      145     1       1       255     1X139S  *       0       -1      TATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACAAGGACTGATATCCAAGGCCATCCTCTGACCTTCACAT    *       RG:Z:1
4_11612_21827_13094     145     1       1       255     1X139S  *       0       -1      TATGGGGAATGGGAGGCAGAGAGCAGAAAATCCTGGGCACATTGTCCAACCAGCCTGACAAGGCAGTGGTGAACAAGAGAGATCTTGTCTCAAATAAGGTACAAGGACTGATATCCAAGGCCATCCTCTGACCTTCACAT    *       RG:Z:1
4_11504_15836_10860     65      2       1       255     140M    *       0       -1      ATGCTCATCTCTAAGCCAGGCTCTGCCTTCTGCGGCCTGCTGGGCTGATTCTGGCCCTCATCTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTAAACACTCCAGAAAAAAAT    *       RG:Z:1
4_21509_7792_12734      65      2       1       255     140M    *       0       -1      ATGCTCATCTCTAAGCCAGGCTCTGCCTTCTGCGGCCTGCTGGGCTGATTCTGGCCCTCATCTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTAAACACTCCAGAAAAAAAT    *       RG:Z:1
4_21510_4429_19523      65      2       1       255     140M    *       0       -1      ATGCTCATCTCTAAGCCAGGCTCTGCCTTCTGCGGCCTGCTGGGCTGATTCTGGCCCTCATCTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTAAACACTCCAGAAAAAAAT    *       RG:Z:1
3_21507_10629_19277     65      2       1       255     140M    *       0       -1      ATGCTCATCTCTAAGCAAGGCTCTGCCTTCTGCGGCCTGCTGGGCTGATTCTGGCCCTCATCTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTATACACTCCAGAAAAAAAT    *       RG:Z:1
2_21111_21594_16288     65      2       1       255     140M    *       0       -1      ATGCTCATCTCTAAGCCAGGCTCTGCCTTCTGCGGCCTGCTGGGCTGATTCTGGCCCTCAACTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTAAACACTCCAGAAAAAAAT    *       RG:Z:1
2_21111_21594_16288     145     2       1       255     1X139S  *       0       -1      ACTGCTGGGCTGATTCTGGACCTCATCAGGAACTTTTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTTAACAGAAGTTTTCTAAACTCTCCAGAAATAAATGAGTAGAGACCTGACAGTGGCCGTCTATCATCCAT    *       RG:Z:1
4_11504_15836_10860     145     2       1       255     1X139S  *       0       -1      CCTGCTGGGCTGATTCTGGCCCTCATCTGGAACTGGTTCTAACTGTTTTAAATCTTATTGCCAGTATCTTTAAACAGAAGATTTCTAAACACTCCAGAAAAAAATGAGTAGAGACCTGACAGTGGCCGTCTATCATCCAT    *       RG:Z:1

Any thoughts you have would be greatly appreciated.
Thanks so much!

Tom

Nicolas Rochette

unread,
Nov 9, 2017, 12:50:49 PM11/9/17
to stacks...@googlegroups.com

Hi Tom,

We have removed genotypes for the moment, as with all the v2 changes it became deprecated, but we are looking forward to reincorporate its functionalities into populations.

Regarding the error, it is a quite general consistency check (a read failed to be added to the alignment matrix of its locus) so at the moment it's hard for me to tell where it's coming from. Could I ask you to let me know (1) whether the error persists if you give the --ignore-pe-reads option, and (2) what happens if you try to run the "stacks-gdb gstacks -P ./ -M population_map.txt" command (the GDB program may be available on your cluster by default, as a module, or it may not be available in which case just let me know that it's not available to you).

Best,

Nicolas

Tom Brekke

unread,
Nov 15, 2017, 9:10:21 AM11/15/17
to stacks...@googlegroups.com
Hi Nicolas,
I've run gstacks with the --ignore-pe-reads and the same error came up:

gstacks -P ./ -M population_map.txt --ignore-pe-reads

Logging to './gstacks.log'.
Configuration for this run:
  Input mode: denovo
  Population map: 'population_map.txt'
  Input files: 147, e.g. './EEEE0-1F_R1_clipped_passed-re-filter.matches.bam'

  Output to: './gstacks.*'
  Model: marukilow (var_alpha: 0.05, gt_alpha: 0.05)
  Ignoring paired-end reads.


Processing all loci...
Assertion failed: (cigar_length_ref(r.cigar) == ref_.length()), function add, file src/locus.h, line 296.
Abort trap: 6



I had to instal GDB and ran the stacks-gdb as you requested:

stacks-gdb gstacks -P -M population_map.txt
Reading symbols from gstacks...done.
(gdb) Catchpoint 1 (throw)
(gdb) Starting program: /usr/local/bin/gstacks -P -M population_map.txt
Unable to find Mach task port for process-id 40377: (os/kern) failure (0x5).
 (please check gdb is codesigned - see taskgated(8))
(gdb) No stack.
(gdb) quit

That doesn't mean much to me, hopefully you understand it, though it's entirely possible that I messed up the install.

Best,
Tom

To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users+unsubscribe@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/JAs16qz6R1I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users+unsubscribe@googlegroups.com.



--
Tom Brekke, Ph.D.
Postdoctoral Research Officer
School of Biological Sciences
Bangor University
tom.b...@gmail.com

Nicolas Rochette

unread,
Nov 15, 2017, 11:06:24 AM11/15/17
to stacks...@googlegroups.com

Hi Tom,

This apparently just means that GDB is not entirely installed. A solution is proposed at:
https://stackoverflow.com/questions/11504377/gdb-fails-with-unable-to-find-mach-task-port-for-process-id-error

But actually, since we're on the Beta1 version thread, could you first confirm that you are using Beta4, and if not that the bug still occurs after upgrading?

Best,

Nicolas

To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.

Tom Brekke

unread,
Nov 16, 2017, 9:08:07 AM11/16/17
to stacks...@googlegroups.com
Hi Nicolas,
I am using Beta4.
I've gotten gdb to work now (it seems) and here's the new output:

sudo stacks-gdb gstacks -P ./ -M population_map.txt
Reading symbols from gstacks...done.
(gdb) Catchpoint 1 (throw)
(gdb) Starting program: /usr/local/bin/gstacks -P ./ -M population_map.txt
[New Thread 0x1903 of process 6367]
warning: unhandled dyld version (15)

Logging to './gstacks.log'.
Configuration for this run:
  Input mode: denovo
  Population map: 'population_map.txt'
  Input files: 147, e.g. './EEEE0-1F_R1_clipped_passed-re-filter.matches.bam'
  Output to: './gstacks.*'
  Model: marukilow (var_alpha: 0.05, gt_alpha: 0.05)

Processing all loci...
Assertion failed: (cigar_length_ref(r.cigar) == ref_.length()), function add, file src/locus.h, line 296.

Thread 2 received signal SIGABRT, Aborted.
0x00007fff8bed1d42 in ?? ()
(gdb) #0  0x00007fff8bed1d42 in ?? ()
#1  0x00007fff8bfbf457 in ?? ()
#2  0x00007fff5fbfe910 in ?? ()
#3  0x00000307001fe028 in ?? ()
#4  0x00007fff5fbfe910 in ?? ()
#5  0x0000000000000128 in ?? ()
#6  0x00007fff5fbfe920 in ?? ()
#7  0x00007fff8be37420 in ?? ()
#8  0xffffffff00000030 in ?? ()
#9  0x00007fff5fbfe930 in ?? ()
#10 0x00007fffffffffdf in ?? ()
#11 0x00000001000d7220 in ?? ()
#12 0x00007fff5fbfe960 in ?? ()
#13 0x00007fff8bdfe893 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) quit
A debugging session is active.

    Inferior 1 [process 6367] will be killed.

Quit anyway? (y or n) [answered Y; input not from terminal]


Does that help at all?
Best,
Tom

Nicolas Rochette

unread,
Nov 16, 2017, 2:39:40 PM11/16/17
to stacks...@googlegroups.com

Hi Tom,

Thanks for helping with this, I really appreciate it. The #0-#14 is the series of functions that the program was in when it crashed, but for some reason the function names are missing.

To your knowledge, have you done anything special to the Stacks executables, e.g. used the 'strip' command on them?

What is the output if you run the command

```
file $(which gstacks)
```

Do you know which compiler you're using? (GCC or Clang?)

Also, could you send me you denovo_map.log file? (Or the individual logs, if you ran the pipeline manually.) I might be able to make a guess.

Best,
Nicolas
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.

Tom Brekke

unread,
Nov 17, 2017, 5:46:14 AM11/17/17
to stacks...@googlegroups.com
Hi Nicolas,
I'm glad to be able to help!
I had played around a bit with the raw code trying to track the error down myself and didn't get anywhere, but I re-compiled everything from a new download before writing to you the first time so everything we've discussed is from a clean install. I didn't do anything like a strip ever. Also I'm running Mac OS Sierra 10.12.6 - not sure if that helps or not.

file $(which gstacks)
/usr/local/bin/gstacks: Mach-O 64-bit executable x86_64


I'm using GCC as a compiler.

I did run the pipeline manually - attached are the log files for process_radtags (just one individual), tsv2bam, gstacks, and stacks-gdb.  I don't have any for ustacks, cstacks, or sstacks.

Tom


gstacks.log
SSSS0-1M_R1_clipped_passed-re-filter.log
stacks-gdb.log
tsv2bam.log

Ludovic

unread,
Jun 19, 2018, 10:28:52 PM6/19/18
to Stacks
Hi all,

thank you for this awesome work.

he parameter -m I believe has disappeared from the help of denovo_map.pl . But I think stacks is still accepting it.

Best

Ludo
Reply all
Reply to author
Forward
0 new messages