discrepancies with number of loci retained and outputted to files with populations script

102 views
Skip to first unread message

Kerin Bentley

unread,
Oct 31, 2015, 5:13:40 PM10/31/15
to Stacks
Hi,
I have run the populations script several times on different grouping of my dataset. It works great....except that it says it found X number of loci and that it printed those loci to the genepop files etc. BUT when I go into those files there are fewer loci.

For example, the populations script output said it found 560 loci but the genepop file only had 533. Another output said it found 1115 loci but there were only 1039 loci in the genepop file. 

Why is this? We are running Stacks version 1.35 on the zcluster here at UGA where I am having the issue.

Thanks!
Kerin

Eleanor Bors

unread,
Feb 16, 2016, 12:08:35 AM2/16/16
to Stacks
Hi Kerin-

Did you ever make progress figuring out the root of the discrepancies?  I have a similar issue--I'm seeing a difference in the number of loci in (1) structure, (2) genepop, and (3) sumstats output files from the populations programs.  There are up to 2x as many loci in the summary statistics file than the others. Unfortunately this may be affecting my summary statistics calculations using different downstream programs. 

I have been running populations with the --write_single_snp flag and at first I thought that the difference in the # of loci in my sumstats file was because it contained loci that had more than 1 SNP while the other formats didn't.  I think I found a good way to check for that using python and it actually doesn't seem to be true (please correct me if you've found otherwise!).

I know there have been posts along a similar vein in the past but perhaps not the exact same--if anyone knows of one, please point us in the right direction.

Thank you!
-Ellie

David

unread,
May 31, 2016, 4:29:09 PM5/31/16
to Stacks
Hey guys,

any progress on this? I have the same problem with STACKS 1:34... for me it's a minor difference (~2%) but it's a difference and I can't explain it!
Any help appreciated!

Cheers,
David

Julian Catchen

unread,
Jun 1, 2016, 4:51:38 PM6/1/16
to stacks...@googlegroups.com, comb...@g.harvard.edu
Hi David,

This question comes up pretty frequently on list. The last time I
answered it was here:

https://groups.google.com/d/msg/stacks-users/BJxvnQ79OG0/XKLw1L3wHgAJ

The output for the different Stacks export files (sumstats, genepop,
structure, VCF, etc) should all be identical with some small variation
possible due to different rules in the exported file format.

The loci printed in each formatted file is provided, so all it takes is
a little UNIX to find which loci are different between files if there
are any differences at all. As far as I know there are no bugs in the
Stacks output in terms of exporting the same loci.

At the very least, you need to identify which loci are in one file but
not the other.

The web interface will allow you to view each locus and see the
different genotypes by eye. You can compare what you see in the web
interface for a particular locus with what you find in the exported file.

If you can find an inconsistency, i.e. provide a test case, I will look
for a bug, but the likelihood of a bug is small (but not zero).

Best,

julian


David wrote:
> Hey guys,
>
> any progress on this? I have the same problem with STACKS 1:34... for me
> it's a minor difference (~2%) but it's a difference and I can't explain it!
> Any help appreciated!
>
> Cheers,
> David
>
>
>
>
> On Tuesday, February 16, 2016 at 12:08:35 AM UTC-5, Eleanor Bors wrote:
>
> Hi Kerin-
>
> Did you ever make progress figuring out the root of the
> discrepancies? I have a similar issue--I'm seeing a difference in
> the number of loci in (1) structure, (2) genepop, and (3) sumstats
> output files from the /populations/ programs. There are up to 2x as
> many loci in the summary statistics file than the others.
> Unfortunately this may be affecting my summary statistics
> calculations using different downstream programs.
>
> I have been running /populations /with the --write_single_snp flag
> and at first I thought that the difference in the # of loci in my
> sumstats file was because it contained loci that had more than 1 SNP
> while the other formats didn't. I /think/ I found a good way to

Kerin Bentley

unread,
Jun 1, 2016, 4:52:13 PM6/1/16
to stacks...@googlegroups.com, Julian Catchen
Hi guys, 

I was never able to resolve this. Let me know though if you figure it out? Julian- any ideas?

Best,
Kerin

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.



--
Kerin Bentley, PhD
Department of Genetics
University of Georgia
Athens, GA USA
Reply all
Reply to author
Forward
0 new messages