mapping to catalog.fa.gz

183 views
Skip to first unread message

kym...@gmail.com

unread,
Jul 16, 2021, 11:06:42 AM7/16/21
to Stacks
Hi

I'm trying to resolve batch effect issues across two ddRAD sequencing runs. My plan is to use the catalog.fa.gz file from one sequencing run to map reads from the other sequencing run to, to do SNP calling/genotyping so I will have a common set of SNPs between the two runs. I have a reference genome available for my species so have reference-aligned bam files as my sample input. I'm getting confused about whether it is possible to do what I propose in the refmap pipeline. I can see from the manual I could call SNPs using an existing catalog through sstacks in the de novo pipeline but I'm  unsure whether sstacks could take the reference-aligned bam files that I have. But then in the reference map pipeline, I can't see an option to supply an existing catalog file to gstacks to undertake SNP calling from reference-aligned bams.

I am also wondering whether it might be a better approach to create a whitelist from the catalog and use this at the populations step instead?

Would someone be able to offer a suggestion of the best way I could do SNP calling against an existing catalog for reference-aligned samples?

Thanks
Kym

Julian Catchen

unread,
Jul 19, 2021, 5:35:34 PM7/19/21
to stacks...@googlegroups.com, kym...@gmail.com
Hi Kym,

It is not clear to me why you want to assemble loci against an existing
catalog if you have a reference genome? Why not just align all samples
to the reference genome and then use that to call SNPs across the whole
data set?

If you did not have a reference genome, one possible approach would be
to assemble loci de novo, and use the resulting catalog as a reference
genome. But then, you would still need to run all of the samples against
that 'reference genome' to get a common set of SNPs, just like above.

Best,

julian

kym...@gmail.com wrote on 7/16/21 10:06 AM:
Reply all
Reply to author
Forward
0 new messages