fasta output populations

440 views
Skip to first unread message

astrid...@unibas.ch

unread,
Feb 18, 2015, 10:30:51 AM2/18/15
to stacks...@googlegroups.com

Hi Julian
I used the fasta output from populations on a denovomap run to filter for loci present only in a certain subset of my samples.
This mostly seems to work fine
However, I cam across a locus that according to the fasta file is present in only nine samples as deduced from the headers:
>CLocus_5514_Sample_11_Locus_3883_Allele_0
>CLocus_5514_Sample_30_Locus_22063_Allele_0
>CLocus_5514_Sample_36_Locus_19838_Allele_0
>CLocus_5514_Sample_38_Locus_22415_Allele_0
>CLocus_5514_Sample_42_Locus_45705_Allele_0
>CLocus_5514_Sample_45_Locus_40484_Allele_0
>CLocus_5514_Sample_47_Locus_46616_Allele_0
>CLocus_5514_Sample_54_Locus_28212_Allele_0
>CLocus_5514_Sample_55_Locus_5051_Allele_0

However, when I filter for the ID 5514 over the webinterface I get the picture above, so many more samples with that locus
Any idea why this is happening?
I used version 1.20 for this
Thanks in advace
Astrid

Julian Catchen

unread,
Feb 23, 2015, 10:35:33 PM2/23/15
to stacks...@googlegroups.com, astrid...@unibas.ch
Hi Astrid,

I'm not sure what is happening here. It looks like only the nine
homozygotes are being output (but you would have to check that the
homozygotes in locus 5514 are the sample IDs you listed below. (In the
latest Stacks release, we print the sample name into these headers to
make this checking easier.) Looking at the code, I can't see any
immediate reason all the loci aren't being output. What was your
populations command?

Best,

julian

astrid...@unibas.ch wrote:
> <https://lh6.googleusercontent.com/-6zwB9j_odXw/VOSwFvn3AnI/AAAAAAAAAAM/imTvLGFQDO4/s1600/Screen%2BShot%2B2015-02-18%2Bat%2B16.30.10.png>

astrid boehne

unread,
Feb 24, 2015, 3:27:13 AM2/24/15
to stacks...@googlegroups.com
Hi Julian
The Populations command I used is the following
populations -t 20 -b 1 -P
$HOME/RAD_seq_Eugene/labstrain/denovomapPop_20141010_repeated \
-M $HOME/RAD_seq_Eugene/labstrain/population_file_labstrain \
-B
$HOME/RAD_seq_Eugene/labstrain/denovomapPop_20141010/blacklisted_markers \
-e sbfI --fasta -p 1 -m 3 -s -f p_value --fstats

All the best
Astrid
--
Astrid Böhne
Universität Basel
Zoologisches Institut
Evolutionsbiologie
Vesalgasse 1
CH-4051 Basel
Switzerland
Phone +41 (0)61 267 03 05
Fax +41 (0) 61 267 03 01

lp

unread,
Feb 24, 2015, 10:55:10 AM2/24/15
to stacks...@googlegroups.com
Astrid et al.,
I have had similarly unexpected results in the fasta output. After mapping my individual names to the internal IDs (thanks Julian), I see some inconsistencies. The number of samples output is as expected based on filtering for read depth, but a number of individuals that are heterozygous on the web interface are homozygous on the batch_XXX.fa output.  I can show my code/output if needed but just wanted to add my observation here to Astrid's. 

On another note, is there a suggested way to parse the .fa output? Its a difficult (for me anyway) to convert it to other popgen formats (e.g. Nexus, arlequin).

Louis

Julian Catchen

unread,
Feb 25, 2015, 7:36:28 PM2/25/15
to stacks...@googlegroups.com, louis....@gmail.com, astrid...@unibas.ch
Hi Astrid and LP,

I have verified that the FASTA (both strict and non strict) appear to be
working as intended, at least on my test set. To move forward I will
need a test case where the populations program demonstrably outputs the
wrong FASTA sequences.

You should verify that a particular SNP position hasn't been filtered
out by the populations program (this would cause the web interface and
the FASTA files to not look the same). You can find all of the SNPs for
a locus that survive filtering in the batch_X.sumstats.tsv file.

I have spent some time refining the minor allele frequency filtering
code and covered some missing cases. I also added logging so it will
tell you which polymorphic positions are being filtered and why in the
populations log file. This should help narrow down whether the FASTA
properly matches the web interface. You can try it, although it is not
released yet:

http://creskolab.uoregon.edu/stacks/source/stacks-1.27.tar.gz

Can either of you provide a subset of your *.tags.tsv, *.matches.tsv and
catalog files that can reproduce the bug you see? Perhaps zip them up
and share them on Dropbox?

Best,

julian

zeamne T

unread,
Mar 10, 2016, 10:47:22 PM3/10/16
to Stacks, louis....@gmail.com, astrid...@unibas.ch, jcat...@illinois.edu
Dear Julian,

Has this been resolved? I think I am having similar issues, although I am not using the web interface to check because I didn't manage to figure out MySQL. 

I encountered these 2 issues:
1. According to my vcf file, there should be 81 samples that pass the filter at locus 3995:
un 3995 49 A G . PASS NS=81;AF=0.574,0.426 GT:DP:AD:GL 0/0:14:14,0:.,19.41,. 1/1:20:0,20:.,27.73,. 0/0:15:15,0:.,20.79,. ./.:0:.,.:.,.,. 0/0:22:22,0:.,30.5,. ./.:0:.,.:.,.,. 0/1:16:8,8:.,22.18,. 1/1:14:0,14:.,19.41,. 0/0:15:15,0:.,20.79,. 1/1:18:0,18:.,24.95,. 0/1:20:13,7:.,27.73,. 0/1:14:8,6:.,19.41,. 0/1:20:9,11:.,27.73,. 0/1:20:10,10:.,27.73,. 1/1:10:0,10:.,13.86,. 0/0:13:13,0:.,18.02,. 0/1:13:7,6:.,18.02,. ./.:0:.,.:.,.,. 0/0:22:22,0:.,30.5,. 0/0:10:10,0:.,13.86,. ./.:0:.,.:.,.,. 0/0:12:12,0:.,16.64,. 0/0:14:14,0:.,19.41,. 0/1:21:9,12:.,29.11,. 0/1:19:5,14:.,26.34,. ./.:0:.,.:.,.,. 1/1:10:0,10:.,13.86,. 1/1:18:0,18:.,24.95,. 1/1:18:0,18:.,24.95,. 0/0:12:12,0:.,16.64,. ./.:0:.,.:.,.,. 1/1:15:0,15:.,20.79,. 1/1:17:0,17:.,23.57,. 0/0:18:18,0:.,24.95,. ./.:0:.,.:.,.,. 0/1:19:9,10:.,26.34,. 1/1:15:0,15:.,20.79,. 0/1:18:8,10:.,24.95,. 0/1:15:5,10:.,20.79,. 1/1:16:0,16:.,22.18,. 0/0:12:12,0:.,16.64,. ./.:0:.,.:.,.,. 1/1:16:0,16:.,22.18,. 1/1:13:0,13:.,18.02,. 0/0:16:16,0:.,22.18,. 1/1:12:0,12:.,16.64,. 1/1:18:0,18:.,24.95,. 0/1:19:10,9:.,26.34,. 0/1:16:9,7:.,22.18,. 1/1:11:0,11:.,15.25,. 0/1:13:8,5:.,18.02,. ./.:0:.,.:.,.,. 0/1:23:12,11:.,31.88,. 1/1:20:0,20:.,27.73,. 0/0:14:14,0:.,19.41,. 0/1:19:9,10:.,26.34,. 0/0:21:21,0:.,29.11,. 1/1:29:0,29:.,40.2,. 0/1:18:7,11:.,24.95,. 0/1:17:8,9:.,23.57,. 0/1:13:7,6:.,18.02,. 1/1:10:0,10:.,13.86,. ./.:0:.,.:.,.,. 0/1:17:9,8:.,23.57,. 0/0:25:25,0:.,34.66,. 1/1:21:0,21:.,29.11,. 0/1:28:14,14:.,38.82,. 0/1:14:6,8:.,19.41,. 0/1:30:17,13:.,41.59,. 0/1:20:9,11:.,27.73,. 0/1:22:11,11:.,30.5,. 0/0:17:17,0:.,23.57,. 0/0:14:14,0:.,19.41,. 0/0:15:15,0:.,20.79,. 0/0:19:19,0:.,26.34,. 0/0:10:10,0:.,13.86,. 0/1:28:14,14:.,38.82,. 0/0:15:15,0:.,20.79,. ./.:0:.,.:.,.,. 0/0:30:30,0:.,41.59,. 0/0:26:26,0:.,36.04,. 1/1:18:0,18:.,24.95,. 0/0:26:26,0:.,36.04,. 0/1:18:10,8:.,24.95,. 0/0:25:25,0:.,34.66,. 0/0:17:17,0:.,23.57,. ./.:0:.,.:.,.,. 0/0:16:16,0:.,22.18,. 0/0:11:11,0:.,15.25,. 0/0:15:15,0:.,20.79,. 0/0:14:14,0:.,19.41,. 0/0:13:13,0:.,18.02,. 0/0:15:15,0:.,20.79,. ./.:0:.,.:.,.,.
However, in my fasta output file there are 88 samples (by counting the number of headers containing "CLocus_3995" in the file) at this locus.

2. What is more worrying is that the sample_IDs in the fasta headers do not seem to tally with my sample numbers. For instance, one of my fasta headers looks like this:
>CLocus_3995_Sample_74_Locus_36685_Allele_0 [js1P04.fltchim]
I assume Sample_74 comes from the order in which my samples were specified in my population map file, which in this case #74 should be js1F04.fltchim instead.

Is there any other way that I can share my files with you (e.g. google drive)? My dropbox is too full and I am unable to upload it there.. It should not be more than 400MB if I share the catalog and just one set of .tsv files.

Best regards,
yujie

Julian Catchen

unread,
Mar 17, 2016, 1:28:28 PM3/17/16
to zeamne T, Stacks
Dear Yujie,

Did you specify --fasta or --fasta_strict for your output, only the
latter will match your VCF file.

The FASTA header lines will tell you the sample and the allele from each
sample that is output, so you should be able to match them up to what
you see in the VCF file.

The sample ID does not match your population map. The sample ID is
assigned when the sample is run through pstacks or ustacks and is fixed
in your various Stacks output files (*.tags/alleles/snps/matches.tsv).

julian

zeamne T wrote:
> Dear Julian,
>
> Has this been resolved? I think I am having similar issues, although I
> am not using the web interface to check because I didn't manage to
> figure out MySQL.
>
> I encountered these 2 issues:
> 1. According to my vcf file, there should be 81 samples that pass the
> filter at locus 3995:
> un399549AG.PASSNS=81;AF=0.574,0.426GT:DP:AD:GL0/0:14:14,0:.,19.41,.1/1:20:0,20:.,27.73,.0/0:15:15,0:.,20.79,../.:0:.,.:.,.,.0/0:22:22,0:.,30.5,../.:0:.,.:.,.,.0/1:16:8,8:.,22.18,.1/1:14:0,14:.,19.41,.0/0:15:15,0:.,20.79,.1/1:18:0,18:.,24.95,.0/1:20:13,7:.,27.73,.0/1:14:8,6:.,19.41,.0/1:20:9,11:.,27.73,.0/1:20:10,10:.,27.73,.1/1:10:0,10:.,13.86,.0/0:13:13,0:.,18.02,.0/1:13:7,6:.,18.02,../.:0:.,.:.,.,.0/0:22:22,0:.,30.5,.0/0:10:10,0:.,13.86,../.:0:.,.:.,.,.0/0:12:12,0:.,16.64,.0/0:14:14,0:.,19.41,.0/1:21:9,12:.,29.11,.0/1:19:5,14:.,26.34,../.:0:.,.:.,.,.1/1:10:0,10:.,13.86,.1/1:18:0,18:.,24.95,.1/1:18:0,18:.,24.95,.0/0:12:12,0:.,16.64,../.:0:.,.:.,.,.1/1:15:0,15:.,20.79,.1/1:17:0,17:.,23.57,.0/0:18:18,0:.,24.95,../.:0:.,.:.,.,.0/1:19:9,10:.,26.34,.1/1:15:0,15:.,20.79,.0/1:18:8,10:.,24.95,.0/1:15:5,10:.,20.79,.1/1:16:0,16:.,22.18,.0/0:12:12,0:.,16.64,../.:0:.,.:.,.,.1/1:16:0,16:.,22.18,.1/1:13:0,13:.,18.02,.0/0:16:16,0:.,22.18,.1/1:12:0,12:.,16.64,.1/1:18:0,18:.,24.95,.0/1:19:
10,9:.,26.34,.0/1:16:9,7:.,22.18,.1/1:11:0,11:.,15.25,.0/1:13:8,5:.,18.02,../.:0:.,.:.,.,.0/1:23:12,11:.,31.88,.1/1:20:0,20:.,27.73,.0/0:14:14,0:.,19.41,.0/1:19:9,10:.,26.34,.0/0:21:21,0:.,29.11,.1/1:29:0,29:.,40.2,.0/1:18:7,11:.,24.95,.0/1:17:8,9:.,23.57,.0/1:13:7,6:.,18.02,.1/1:10:0,10:.,13.86,../.:0:.,.:.,.,.0/1:17:9,8:.,23.57,.0/0:25:25,0:.,34.66,.1/1:21:0,21:.,29.11,.0/1:28:14,14:.,38.82,.0/1:14:6,8:.,19.41,.0/1:30:17,13:.,41.59,.0/1:20:9,11:.,27.73,.0/1:22:11,11:.,30.5,.0/0:17:17,0:.,23.57,.0/0:14:14,0:.,19.41,.0/0:15:15,0:.,20.79,.0/0:19:19,0:.,26.34,.0/0:10:10,0:.,13.86,.0/1:28:14,14:.,38.82,.0/0:15:15,0:.,20.79,../.:0:.,.:.,.,.0/0:30:30,0:.,41.59,.0/0:26:26,0:.,36.04,.1/1:18:0,18:.,24.95,.0/0:26:26,0:.,36.04,.0/1:18:10,8:.,24.95,.0/0:25:25,0:.,34.66,.0/0:17:17,0:.,23.57,../.:0:.,.:.,.,.0/0:16:16,0:.,22.18,.0/0:11:11,0:.,15.25,.0/0:15:15,0:.,20.79,.0/0:14:14,0:.,19.41,.0/0:13:13,0:.,18.02,.0/0:15:15,0:.,20.79,../.:0:.,.:.,.,.
Reply all
Reply to author
Forward
0 new messages