How to best use Stacks

101 views
Skip to first unread message

Loic Pittet

unread,
Jan 10, 2025, 4:15:40 AM1/10/25
to Stacks
Dear all,

I have been using Stacks for many years, but still I am wondering what would be the best option to process my new data. I would be happy to hear your opinion on that. Here is what my dataset is made of:
  • 100bp Single-end RAD seq reads
  • 75 samples representing three species (two diploid parents, one tetraploid)
  • No references for any species
Should I build a denovo reference using only the parent species and then map the tetraploid to it... Is it actually possible to do so ? I would really appreciate your advices !

Best,

Loïc

Catchen, Julian

unread,
Jan 10, 2025, 1:20:50 PM1/10/25
to stacks...@googlegroups.com

Hi Loïc,

 

What you decide to do would depend on how the three species are related to one another, and how you expect the RAD loci to be shared among them. But, you can experiment with different setups. Without knowing the details of your system, I would probably try constructing three different de novo datasets: 1) Parent/species 1 catalog then matching the tetraploids to that catalog; 2) Parent/species 2 catalog and the tetraploids matched to it; 3) Maybe all three together in one catalog – it depends again on how close they all are to one another. I would also optimize my assembly parameters independently for the three species before I decided on a final set of parameters. Then, you would want to see how the tetraploids map to the different parental species, but again, it depends on what you are hoping to show. A final alternative would be to select the best de novo dataset (that is, the one where the most total loci can be mapped to the catalog), then turn that into a reference and align the other species to it doing a reference-based analysis.

 

Best,


Julian

Claude Patrick Millet

unread,
Feb 2, 2026, 8:23:42 AMFeb 2
to Stacks
Hi Julian, Loic. Allow me to piggyback on this question. When you say "A final alternative would be to select the best de novo dataset (that is, the one where the most total loci can be mapped to the catalog), then turn that into a reference and align the other species to it doing a reference-based analysis.", do you mean as in Heller et al 2021 (https://doi.org/10.1111/1755-0998.13324), parsing the catalog.tags.tsv (output from cstacks) into a fasta file, as they do? And if so, should the different consensus sequences be concatenated (interspersed with NNNNs) as they do, or would one line-one sequence work?
I am asking because I have a haplodiploid species and ran de novo on the males (haploid), keeping -M low (1 and 2, ended up going with 1) and varying -n to optimize. I then wanted to map all my samples, males and females, to a reference generated from the cstacks catalog.

 


Catchen, Julian

unread,
Feb 4, 2026, 6:11:26 PM (13 days ago) Feb 4
to stacks...@googlegroups.com
Hi Claude,

I don’t see a reason to concatenate sequences together, especially if you want to keep track of each RAD locus independently after you do your read alignments. Once you have a FASTA file of the consensus sequences, assuming you will align reads against that, using BWA or similar, it does not need the loci to be concatenated unless some downstream package assumes/requires “chromosomes” are/as one continuous set of reads even if separated artificially by Ns (say for linkage analysis).

Best,

Julian

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/stacks-users/d617f769-9c99-443c-b6f3-2a7b3b0d905en%40googlegroups.com.

Claude Patrick Millet

unread,
Feb 9, 2026, 6:21:02 AM (8 days ago) Feb 9
to Stacks
Thank you very much for you answer. Somewhat related, but I had run parameter optimization (with the intent of keeping the best run to make my pseudo-reference). I accidentally deleted the tsv file, so I had to run it again (using the same samples). When I did, I found a slightly different (>100) number of R80 loci. I reran the values above and below to confirm, and they also were different (although both times converged towards the same parameters). 
Is that normal? does that mean the denovo pipeline is not guaranteed to output the exact same catalog each time it is run? Or did I do somehting wrong (I don't see what it could be, as I just reran the scripts)
Thanks,
Patrick

Claude Patrick Millet

unread,
Feb 9, 2026, 6:24:17 AM (8 days ago) Feb 9
to Stacks
Actually I rechecked, and the difference is slighter than that. It's less than 10. however, this variability is greater than the difference between my optimal parameter and the one after it...
Reply all
Reply to author
Forward
0 new messages