Cross mapping using paired ends/cross mapping between species

89 views
Skip to first unread message

Nora Gavin-Smyth

unread,
Oct 5, 2023, 12:06:50 AM10/5/23
to Stacks
I am trying to use the cross mapping approach that is explained in the Stacks manual. I have ddRAD data, so how do I handle the paired reads using this approach? I tried concatenating r1 and r2 reads before ustacks, but gstacks reports that it does not detect paired reads. 

My dataset is comprised of ten populations of two different species, 5 populations of each, and my goal is to find any loci that are variable in both species. I am also open to suggestions of a better approach!

Many thanks!

Catchen, Julian

unread,
Oct 5, 2023, 11:06:13 AM10/5/23
to stacks...@googlegroups.com

Hi, I am not sure what you mean by “cross mapping”. Are you trying to align your different species against a reference genome, or do a de novo assembly across the species? Either way, you will want to provide the single and paired-end reads to Stacks as separate files – as you would receive them from process_radtags.

 

If you haven’t already, you will want to take a look at one of our protocol papers to get a better idea of how to put your data through the pipeline, I would recommend: https://link.springer.com/protocol/10.1007/978-1-0716-2313-8_7

Nora Gavin-Smyth

unread,
Oct 5, 2023, 5:20:28 PM10/5/23
to Stacks
Hi Julian and thanks for your response. Sorry, I mean doing a de novo Genetic Mapping Cross, as in part 4.4.3 of the Stacks manual. The idea was to map one species using the catalog of the other species. I have no problem running these data through the denovo_pl or aligning to a reference. What I'm unclear on is how to run ustacks manually with the r1 and r2 reads files--the example for Genetic Mapping Cross in the manual has a single file for each sample. I have also tried providing each read file separately but then I am unclear on at what point the r1 and r2 can be integrated as the same sample when doing the denovo pipeline manually.

More background on why I'm trying to do this: When I run the denovo_pl on both species together, I get loci that are variable between species but fixed within the species. I am looking for any loci that are variable within both species. 

Many thanks!

Catchen, Julian

unread,
Oct 9, 2023, 3:24:21 PM10/9/23
to stacks...@googlegroups.com

Hi Nora,

 

Okay, got your meaning now. When processing paired-end data in Stacks, for a genetic map or otherwise, the process is the same. The pipeline works in the following steps:

 

ustacks -> build loci from single-end reads that are anchored into stacks by the restriction enzyme cutsute

cstacks/sstacks -> make a catalog of all loci in the meta population and match individuals against it

tsv2bam -> bring in paired-end reads for each single-end read that was assembled into a locus by ustacks

gstacks -> expand loci built in first half of pipeline by building contigs from single+paired end reads; call SNPs again

populations -> output mappable markers (and many other things)

 

Based on the example in section  4.4.3 of the manual, this all applies, you just have to supply tsv2bam with a path to the paired-end read files and it will process them and pull in the paired-reads to match the single-end reads already processed. You can run denovo_map.pl --dry-run and it will output all these commands for your data without executing anything itself. In other words, you can’t and shouldn’t put the paired-end reads into ustacks.

 

That said, for your issue: “I get loci that are variable between species but fixed within the species”, does not seem congruent to me, since ustacks is run per individual, it should identify any SNPs variable within that individual. Cstacks will match those loci across individuals, but if a SNP (identified by ustacks) is not added to the catalog, then it cannot be found in other individuals when matching with sstacks. Perhaps that is what happened with your approach (“map one species using the catalog of the other species”)?

 

In a mapping cross, the parents of the cross define all loci that can be found in the progeny so we can get away with only building the catalog from the parents. However, if you have a separate cross from a different species, and you want to match it against the loci in the first cross, the representative SNPs/loci have to be in the catalog for them to be identified across individuals.


Anyway, I think the best approach for getting a good, overall look at the data, is to put everything into the catalog (or at least any ‘parents’ from the different species that you crossed) and see how many shared loci there are. That is, just run denovo_map.pl with all your data as a generic population (not a mapping cross), and then take a look at the populations outputs to see how many loci there are and how they are shared across the species.

 

An alternative would be making a pseudo reference genome. You turn the loci from your first species into a reference (export them from populations after a generic denovo_map.pl run and index them as a reference genome), then use BWA to align all reads from both species against this ‘reference’ and proceed like you had a reference genome from the start.

 

Best,

 

julian

 

 

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/766f5870-53a9-46d8-a83e-9ad61038d6c5n%40googlegroups.com.

Nora Gavin-Smyth

unread,
Oct 11, 2023, 2:15:57 PM10/11/23
to Stacks
Thanks so much for this response. You've given me a lot of good ideas here and I'm going to look into them. Really appreciate you taking the time!

Nora

Reply all
Reply to author
Forward
0 new messages