Global minimum samples in ref_map.pl / Phasing in catalog.fa.gz?

41 views
Skip to first unread message

Inbar Maayan

unread,
Jan 20, 2023, 10:53:40 PM1/20/23
to Stacks
Hi Julian, 

I've got a couple questions about ref_map.pl which I hope you can help me out with. 

1) Is there a way to specify a global minimum number of samples that a locus should be in to be counted? I know that I can specify how many populations a locus needs to be in and what proportion of individuals from each population it should be in through the populations program (e.g., -p 10 -r 0.75), but is there a way to do so agnostic of the population map? In other words, can I tell ref_map.pl to only output loci to catalog.fa.gz that are present in, say, 20 or more individuals out of my total input samples?

2) What do the + and - mean in the naming of loci in catalog.fa.gz? Does this refer to the phasing of the locus or something else?
>1 pos=scaffold_1:387:+ NS=4
CATGCACATTCACAAG.......
>2 pos=scaffold_1:390:- NS=1
CATGCTCTGGCCTGGG......

Thanks a heap!
Inbar

Catchen, Julian

unread,
Jan 23, 2023, 11:54:37 AM1/23/23
to stacks...@googlegroups.com

Hi Inbar,

 

  1. You can use the -R parameter to populations to get a global number or required samples, no matter the popmap used.
  2. The +/- refer to which strand a particular locus is found on.

 

Best,

 

julian

Inbar Maayan

unread,
Jan 23, 2023, 2:08:45 PM1/23/23
to stacks...@googlegroups.com
Thanks a heap Julian!

Inbar Maayan
PhD Candidate | she/her
Department of Organismic and Evolutionary Biology
Harvard University


--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/4EwlD0lwR38/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/BN7PR11MB25461FEFD24BF7B029FA68D9A7C89%40BN7PR11MB2546.namprd11.prod.outlook.com.

Inbar Maayan

unread,
Mar 3, 2023, 10:03:56 AM3/3/23
to Stacks
Hi Julian, 

I've tried running ref_map.pl with the -X "populations:--min-samples-overall 0.15" addition (I have 267 individuals in my dataset so I'm shooting for a minimum of 40, which is about 15%), but when I look at my catalog.fa.gz there are still many loci with NS<40:

>1 pos=scaffold_1:387:+ NS=4
CATGCACA.....

>2 pos=scaffold_1:390:- NS=1
CATGCTC....
>3 pos=scaffold_1:715:+ NS=1
CATGCTG....
>4 pos=scaffold_1:1557:- NS=1
CATGCC....
>5 pos=scaffold_1:1619:- NS=1
CATGC...
>6 pos=scaffold_1:1934:+ NS=8
CATGCA...
>7 pos=scaffold_1:2932:- NS=2
CATGC....
>8 pos=scaffold_1:2935:- NS=2
ATGCAT.....
>9 pos=scaffold_1:2936:- NS=5
CATGCAT.....
>10 pos=scaffold_1:4861:- NS=2
GCGATA......
>11 pos=scaffold_1:4861:+ NS=65
CATGCC.......

Does the populations module add-on only influence which loci go into population-genetic assessments, or can it modify the catalog.fa.gz? If the former, is there a way to get the catalog.fa.gz to only keep loci with over 40 individuals? 
Happy to hear any additional advice you might have for making this part work (based on the manual I assume "--min-samples-overall" and "-R" are equivalent). 

Thank you kindly, Inbar

Catchen, Julian

unread,
Mar 6, 2023, 5:58:14 PM3/6/23
to stacks...@googlegroups.com

Hi Inbar,

 

The catalog is not affected by the filters you provide. The catalog always contains everything found in the dataset. Instead, the flag you specified is passed to populations and the populations-specific output will respect the filtering parameters you asked for. More generally, the core pipeline is designed to be run once (given an optimized set of de novo assembly paramters) and then populations is designed to be run multiple times, with different population maps and/or filters or export formats. I suggest you take a look at our protocol with explains a good bit of this strategy: https://link.springer.com/protocol/10.1007/978-1-0716-2313-8_7.

 

Best,

 

julian

 

From: stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Inbar Maayan <ima...@g.harvard.edu>
Date: Friday, March 3, 2023 at 9:04 AM
To: Stacks <stacks...@googlegroups.com>
Subject: Re: [stacks] Global minimum samples in ref_map.pl / Phasing in catalog.fa.gz?

Hi Julian, 

 

I've tried running ref_map.pl with the -X "populations:--min-samples-overall 0.15" addition (I have 267 individuals in my dataset so I'm shooting for a minimum of 40, which is about 15%), but when I look at my catalog.fa.gz there are still many loci with NS<40:

Inbar Maayan

unread,
Mar 10, 2023, 6:29:17 PM3/10/23
to Stacks
Thanks a heap Julian!
Reply all
Reply to author
Forward
0 new messages