populations output phylip files

560 views
Skip to first unread message

lin...@slu.edu

unread,
Oct 31, 2016, 6:28:22 PM10/31/16
to Stacks
Hello,

I am trying to obtain a phylip file containing concatenated SNPs of all individuals in the analysis to put into SVD quartets. Running the command below I get only the SNPs of the whole population. 
populations -P ../ustacks9/ -M ../ustacks9/popmap_popschem1.2.txt --min_maf 0.05 --max_obs_het 0.75 -b 9 -k -p 8 -m 3 -r 0.8 -t 4 --structure --phylip_var_all --phylip_var --phylip

My output looks like
pop1 ATGCGS
pop2 AGTCTC
pop3 ....etc


I would like to get
indiv1 ATGCGS
indiv2 AGTCTC
indiv3  .......etc

I could obviously change my popmap so that each individual is its own population but I do not know if that would be appropriate. 

Julian Catchen

unread,
Oct 31, 2016, 9:10:13 PM10/31/16
to stacks...@googlegroups.com, lin...@slu.edu
Hi,

Yes, the best way is to create a population map with one individual per
population. Before doing that, I would consider identifying a good
subset of SNPs available across the set of populations and use those to
create a whitelist of SNPs. I would then supply the whitelist to
populations when you use the one-sample-per-population popmap.

julian

lin...@slu.edu

unread,
Nov 1, 2016, 4:23:47 PM11/1/16
to Stacks, lin...@slu.edu, jcat...@illinois.edu
Thanks so much for the reply, 
I have created a whitelist of SNPs (taken from sumstats.tsv file) and remade the popmap file with one individual per population.  The manual indicated that continuing to use filters may reduce the number of loci so I turned them off (I just didn't add them in the command).  After running the command below I went from 1377 SNPS to 26 SNPS (as seen in the .phylip file) and I don't know why that is the case.  The manual also states that loci may be lost because they are fixed within the population, but I don't understand how I can account for this. 
"If you change your population map after creating the whitelist, you may see SNPs drop out of the analysis because introducing a population map may change if a locus is fixed. In a large, single population a locus may be polymorphic, but once you subset your data into multiple populations that locus may become fixed in one or more subpopulations and will not be output in those populations."
The command run was
populations -P ../ -M ../popmap_popschem1.3.txt -W ../whitelist01_r60p60_PS1.3.txt -b 9 -t 4 --structure --phylip_var_all --phylip_var --phylip --vcf --vcf_haplotypes

juliannichol...@gmail.com

unread,
Nov 1, 2016, 4:27:32 PM11/1/16
to Stacks, lin...@slu.edu, jcat...@illinois.edu
Building on this topic I am running populations for the same objective, using this command:

/usr/local/bin/populations -b 3211 -P some/folder -O /some/folder -M some/file.txt -s -r 20 --fasta_strict --phylip 

Then the plan, like the original post would be to screen my database based on the output phylip.log. 

With the parameters above I am only getting one loci in the phylip.log file and when I screen that ID through the database I get both haplotypes in both populations. 

Here is the output from the Phylip file:

# Stacks v1.44;  Phylip sequential; November 01, 2016
# Seq Pos Locus ID Column Population
0 188016 23 Male:T,Female:C,

and here is the output from the database for that loci:

Population Female
FM1
0
FSF6
0
FSF2
-2.77
FSF1
0
FM43
0
FLF6
0
F335
-4.02
Population Male
MSM6
0
MSM4
0
MM21
-1.86
MLM5
-1.39
MLM4
-1.86
MLM2
-1.39


So as far as I can tell the loci in my phylip file is not in fact "fixed-within, and variant among populations". There may be some over sight in how I am interpreting the output or how I am running the pipeline. 

Julian Catchen

unread,
Nov 1, 2016, 8:58:43 PM11/1/16
to juliannichol...@gmail.com, Stacks
In this case you need to click on the alleles in the web interface. Most
likely, at that sequencing depth, your SNP calls are not statistically
significant and are therefore excluded during the phylip export.

julian

juliannichol...@gmail.com wrote:
> Building on this topic I am running populations for the same
> objective, using this command:
>
>
> /usr/local/bin/populations -b 3211 -P some/folder -O /some/folder -M
> some/file.txt -s -r 20 --fasta_strict --phylip
>
> Then the plan, like the original post would be to screen my database
> based on the output phylip.log.
>
> With the parameters above I am only getting one loci in the phylip.log
> file and when I screen that ID through the database I get both
> haplotypes in both populations.
>
> Here is the output from the Phylip file:
>
> # Stacks v1.44; Phylip sequential; November 01, 2016
> # Seq PosLocus IDColumnPopulation
> 018801623Male:T,Female:C,

Julian Catchen

unread,
Nov 1, 2016, 9:02:04 PM11/1/16
to lin...@slu.edu, Stacks
You can specify --phylip or --phylip_var to populations, but not both at
the same time. Whichever one you specify last is the file you will get.
--phylip_var_all writes into a separate file so can specify it in
parallel with the other flags.
Reply all
Reply to author
Forward
0 new messages