Hi, I am not sure what you mean by “cross mapping”. Are you trying to align your different species against a reference genome, or do a de novo assembly across the species? Either way, you will want to provide the single and paired-end reads to Stacks as separate files – as you would receive them from process_radtags.
If you haven’t already, you will want to take a look at one of our protocol papers to get a better idea of how to put your data through the pipeline, I would recommend: https://link.springer.com/protocol/10.1007/978-1-0716-2313-8_7
Hi Nora,
Okay, got your meaning now. When processing paired-end data in Stacks, for a genetic map or otherwise, the process is the same. The pipeline works in the following steps:
ustacks -> build loci from single-end reads that are anchored into stacks by the restriction enzyme cutsute
cstacks/sstacks -> make a catalog of all loci in the meta population and match individuals against it
tsv2bam -> bring in paired-end reads for each single-end read that was assembled into a locus by ustacks
gstacks -> expand loci built in first half of pipeline by building contigs from single+paired end reads; call SNPs again
populations -> output mappable markers (and many other things)
Based on the example in section 4.4.3 of the manual, this all applies, you just have to supply tsv2bam with a path to the paired-end read files and it will process them and pull in the paired-reads to match the single-end reads already processed. You can run denovo_map.pl --dry-run and it will output all these commands for your data without executing anything itself. In other words, you can’t and shouldn’t put the paired-end reads into ustacks.
That said, for your issue: “I get loci that are variable between species but fixed within the species”, does not seem congruent to me, since ustacks is run per individual, it should identify any SNPs variable within that individual. Cstacks will match those loci across individuals, but if a SNP (identified by ustacks) is not added to the catalog, then it cannot be found in other individuals when matching with sstacks. Perhaps that is what happened with your approach (“map one species using the catalog of the other species”)?
In a mapping cross, the parents of the cross define all loci that can be found in the progeny so we can get away with only building the catalog from the parents. However, if you have a separate cross from a different species, and you want to match it against the loci in the first cross, the representative SNPs/loci have to be in the catalog for them to be identified across individuals.
Anyway, I think the best approach for getting a good, overall look at the data, is to put everything into the catalog (or at least any ‘parents’ from the different species that you crossed) and see how many shared loci there are. That is, just run denovo_map.pl
with all your data as a generic population (not a mapping cross), and then take a look at the populations outputs to see how many loci there are and how they are shared across the species.
An alternative would be making a pseudo reference genome. You turn the loci from your first species into a reference (export them from populations after a generic denovo_map.pl run and index them as a reference genome), then use BWA to align all reads from both species against this ‘reference’ and proceed like you had a reference genome from the start.
Best,
julian
--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
stacks-users...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/stacks-users/766f5870-53a9-46d8-a83e-9ad61038d6c5n%40googlegroups.com.