Removing paralogous loci: multiple loci matching the same catalog locus

CaffeSospeso

unread,

Oct 28, 2018, 11:08:30 AM10/28/18

to Stacks

Hello,

I'm working with single digested, paired-end RAD-seq data. And as part of the data processing, I'm dealing now with the issue of finding and excluding paralogous loci. I know that different parameters in the Stacks pipeline can be tuned in order to remove potential paralogous loci.

In the Catchen et al. 2011 paper, it is specified that sstacks program, among other things, is also able to identify loci that match more than one catalog locus and exclude them. In this paragraph is also mentioned that multiple loci can still uniquely match one single catalog locus and that users can exclude these loci later in the analysis. It is not very clear to me which of the other programs will do it. In the previous versions of Stacks, there was the rxstacks program that was able to do it. In stacks v2.0, rxstacks does not exist anymore. Can I find this option implemented elsewhere in Stacks?

Otherwise I have found a new software called PMERGE, which can be used with the output of Stacks pipeline and can identify catalog loci that have high similarity, that is putative paralogs.

What is your opinion?

Thank you,

Gabriele

CaffeSospeso

unread,

Oct 29, 2018, 7:50:27 AM10/29/18

to Stacks

A possible solution I was thinking about is to simply identify the sample loci that match on the same catalog loci and remove them from the matches.tsv file. Would be correct? Could this create issue in the next steps of the Stacks pipeline?

Catchen, Julian

unread,

Oct 30, 2018, 9:11:53 PM10/30/18

to stacks...@googlegroups.com, CaffeSospeso

Hi Gabriele,

The sstacks controls are still in effect for stacks 2.0 with de novo
analyses. Otherwise, the main way to control for paralogous loci is to
use the maximum heterozygosity filter in the populations program. The
idea is that if two distinct loci are incorrectly merged together,
differences in the sequence will be called incorrectly as SNPs, however,
nearly all individuals will appear heterozygous for these 'SNPs'. So, we
can filter loci that have too high a level of heterozygosity using the
filter in populations.

Best,

julian

CaffeSospeso wrote on 10/29/18 4:08 AM:

CaffeSospeso

unread,

Oct 31, 2018, 6:27:49 AM10/31/18

to Stacks

Hi Julien,

Thank you for your reply. This was already clear to me, however, what you just wrote do not necessarily get rid of the multiple sample loci that align on the same catalog loci. Right?

I made a small script that remove from the .marches.tsv.gz file sample loci that align on the same catalog locus. I tried to run the following steps in Stacks (that is tsv2bam), but as expected, it gave me an error. Probably because I need to modify other files. Could you address me on the right files to modify in order to make it suitable for the next steps of Stacks.

Thank you a lot,

Gabriele

Catchen, Julian

unread,

Oct 31, 2018, 8:11:17 AM10/31/18

to stacks...@googlegroups.com, CaffeSospeso

Hi Gabriele,

It is not necessary to do that. If two loci from the same sample, both
independently match the same catalog locus, that data from that sample
will be excluded by the populations program.

julian

CaffeSospeso wrote on 10/31/18 11:27 PM:

> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stacks-users/09872106-8c63-4fcb-86a2-2e9ccadfa4d6%40googlegroups.com
> <https://groups.google.com/d/msgid/stacks-users/09872106-8c63-4fcb-86a2-2e9ccadfa4d6%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward