write single SNP option for Fst calculations

474 views
Skip to first unread message

AFK

unread,
Feb 25, 2014, 11:25:43 AM2/25/14
to stacks...@googlegroups.com
Hi Julian,

I was wondering if it would be possible to add a feature to apply the "write_single_snp"-option also
to the Fst calculations and especially to the window-based kernel smoothing calculations?

My apologies if I am wrong in my following reasoning, but the thing is that several SNPs occurring
on the same locus seem to strongly influence these calculations (naturally, as they are weighed by physical proximity).
But, at least in my data, this leads to the result that candidate outlier loci are often those that harbor
several SNPs. In other words, the signal of moderately differentiated SNPs might be strongly amplified
by their neighboring sites, if they are in perfect linkage and show the same signal (quite common for 100bp reads even in old natural
populations, I guess). Of course there is always some bias as restriction sites are never perfectly
homogeneously spread over the genome. And this might of course be more of problem if
you have a rather low marker density (e.g. due to a low polymorphism rate as in my case)
which might render the whole window approach questionable.

But I guess the main idea behind the window-based approach is to reduce the effect of stochastic variance,
e.g. sampling bias, which does not really hold true for several SNPs stemming from the same locus/molecule, no?

Anyway, I think it would be great to have this option to test for the effect of loci harboring multiple SNPs, if that is 
reasonably easy to code for you.
Alternatively, one could do this by blacklisting single SNPs (not loci), but that might perhaps be computationally
more challenging to incorporate into "populations"?

Thanks anyway for all the hard work and all the improvements and nice features of Stacks so far.

Best wishes,
Andi

 


Julian Catchen

unread,
Feb 28, 2014, 6:38:04 AM2/28/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de
Hi Andi,

I'll have to think about that one. I don't think it would be too hard to
implement.

In the meantime, I would suggest that you try out our new implementation
of Phi_st, which is a haplotype version of Fst. This statistic is new in
Stacks 1.13 and should deal with any issues stemming from multiple SNPs
at a locus since Phi_st considers the haplotype across each locus
instead of the individual SNPs. Our general plan with Stacks development
is to move from SNPs to haplotypes (although we need to finish our
corrections mechanisms first and work out a few issues).

Best,

julian

lj

unread,
Feb 28, 2014, 3:32:47 PM2/28/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, jcat...@uoregon.edu
Hi all,

I have kind of a similar question about ddRAD sequences.
As suggested in previous posts, I used my PE ddRAD data so that the two ends of the fragment (each digested with their own RE) are considered as separate loci in the pipeline.
But if the sequenced fragments are around 300 bp long, and SNPs occur at both ends, then these are linked and should really be considered as a single locus, right?

Would it make sense to look at all polymorphic loci and see if there are PE reads within that data-set, and then get rid of the locus at one of the ends?
If yes, what would be the fastest way for me to do that?

Thanks,
Ljerka
Message has been deleted

AFK

unread,
Mar 5, 2014, 4:05:45 AM3/5/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, jcat...@uoregon.edu
Hi Julian,

thanks for your reply. I've only seen the new release after I posted this. But yes, the haplotype-based statistics should indeed get rid of these effects. The only problem is of course
that right now there is no bootstrapping involved. But maybe you are already planning to implement this. Further, since I used the 1.20beta version with the rxstacks module
to create my files I had to change them manually to make them compatible with the new version (simple bash scripts, so no big deal though).

Anyway, haplotpye-based statistics is a great idea, but I ran some analyses and I get PhiST-values of up to and sometimes even larger than 2. Is that normal?
I thought PhiST is supposed to be analogous to Fst (a range from 0 to 1). I had a look at the Excoffier, Smouse, & Quattro, (1992) paper, but cannot really figure out how exactly
this is calculated.
An error in the manual conversion from the 1.20beta files to the new version doesn't seem to be the problem, as Fst statistics are identical.

Could you maybe clarify this. Is everything working fine and max. values are simply 2 instead of 1, or is something going wrong there?

An example phistats.tsv file with the --log-fst-comp is attached.

Best,
Andi
batch_1.phistats.tsv

Julian Catchen

unread,
Mar 25, 2014, 5:37:14 PM3/25/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de
Hi Andi,

I went back in to the Phi_st code over the last few days and indeed
found a bug in the calculation. That is now fixed and released in Stacks
version 1.16. I think the code is quite robust now.

You are correct that Phi_st (and Fst) is a value between 0 and 1, but
depending on the method used to calculate it you can still get values
that are slightly negative. This is true of the Phi_st implementation
usually in the case of really small sample sizes. And, other Fst
implementations also suffer from this. Anyway, you should no longer have
Phi_st values larger than 1.0.

Best,

julian

AFK

unread,
Mar 26, 2014, 4:55:07 AM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, jcat...@uoregon.edu
Hi Julian,

thanks a lot. I wanted to try the new release, but am getting some errors during compilation.
I get no errors in the first step creating the Makefile, but when I want to "make" I get the following:


[ro...@stacks-1.16]# make
make  all-am
make[1]: Entering directory `/computation/software/stacks-1.16'
g++ -DHAVE_CONFIG_H -I.    -fopenmp  -g -O2 -std=gnu++0x -MT src/ustacks-ustacks.o -MD -MP -MF src/.deps/ustacks-ustacks.Tpo -c -o src/ustacks-ustacks.o `test -f 'src/ustacks.cc' || echo './'`src/ustacks.cc
In file included from src/ustacks.h:77,
                 from src/ustacks.cc:29:
src/gzFasta.h: In constructor ‘GzFasta::GzFasta(const char*)’:
src/gzFasta.h:41: error: ‘gzbuffer’ was not declared in this scope
src/gzFasta.h: In constructor ‘GzFasta::GzFasta(std::string)’:
src/gzFasta.h:49: error: ‘gzbuffer’ was not declared in this scope
In file included from src/ustacks.h:78,
                 from src/ustacks.cc:29:
src/gzFastq.h: In constructor ‘GzFastq::GzFastq(std::string)’:
src/gzFastq.h:40: error: ‘gzbuffer’ was not declared in this scope
src/gzFastq.h: In constructor ‘GzFastq::GzFastq(const char*)’:
src/gzFastq.h:48: error: ‘gzbuffer’ was not declared in this scope
src/ustacks.cc: In function ‘int write_results(std::map<int, MergedStack*, std::less<int>, std::allocator<std::pair<const int, MergedStack*> > >&, std::map<int, Stack*, std::less<int>, std::allocator<std::pair<const int, Stack*> > >&, std::map<int, Rem*, std::less<int>, std::allocator<std::pair<const int, Rem*> > >&)’:
src/ustacks.cc:1431: error: ‘gzbuffer’ was not declared in this scope
src/ustacks.cc:1622: error: ‘gzbuffer’ was not declared in this scope
make[1]: *** [src/ustacks-ustacks.o] Error 1
make[1]: Leaving directory `/computation/software/stacks-1.16'
make: *** [all] Error 2


I guess it has something to do with the support for gzip files, no?
The last version I used was 1.13 which compiled without any problems.

In case it's important, our server is running a: Linux version 2.6.32-358.23.2.el6.x86_64 (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) )


Do you have any idea what is going on or how to fix this?

Best,
Andi







Zophonías O. Jónsson

unread,
Mar 26, 2014, 2:47:29 PM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, jcat...@uoregon.edu
Just a mee-too for your information.

I get the same error message with stacks 1.16 and gcc version 4.4.6 20120305 whereas stacks 1.06 compiles fine.  On the other hand 1.16 compiles without problems with gcc version 4.8.2 

All the best,

Zophonías

Thierry Gosselin

unread,
Mar 26, 2014, 3:49:39 PM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, jcat...@uoregon.edu
Stacks version 1.16 is functional on Mac OSX 10.9.2 with gcc version 4.9.0,
but doesn't compile on our Server running SUSE Linux Enterprise Server 11 with gcc version 4.3.4.

Best,
Thierry

Julian Catchen

unread,
Mar 26, 2014, 6:52:34 PM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, thierry...@me.com, zee...@gmail.com
Hi Thierry, Zophonias, and Andi,

I think the problem has to do with the version of zlib installed on your
systems. The gzbuffer() function was added to zlib version 1.2.4 (in
2010) and it appears that the zlib version you are using may be prior to
that version. You can check by looking at the zlib header file, on my
system:

more /usr/include/zlib.h

To fix it, you could upgrade your version of zlib, or remove those calls
to gzbuffer() from the code (just delete the lines containing the calls
to gzbuffer()).

I'm not sure the best way to handle it in the source, any thoughts? We
could try to conditionally compile based on the version of zlib present,
or something like that, but that would be a pain.

julian

Thierry Gosselin

unread,
Mar 26, 2014, 7:33:53 PM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, thierry...@me.com, zee...@gmail.com, jcat...@uoregon.edu
Hi Julian,
In my case, updating the Linux server with GCC v4.7.2 solved my compilation problems.

Best,
Thierry

Julian Catchen

unread,
Mar 26, 2014, 7:39:34 PM3/26/14
to stacks...@googlegroups.com, andrea...@uni-konstanz.de, zee...@gmail.com
Hi Zophonias and Andi,

I have added some conditional compilation instructions to try and handle
the missing function. Can you try out the code?

http://creskolab.uoregon.edu/stacks/source/stacks-1.17.tar.gz

Of course, the decompression code will be slower with an older version
of zlib, but at least it will work.

Please let me know if you are able to try it and if it works for you,
otherwise I may remove the conditionals in the next release.

Thanks,

julian

"Zophonías O. Jónsson"

unread,
Mar 26, 2014, 7:53:13 PM3/26/14
to Julian Catchen, stacks...@googlegroups.com, andrea...@uni-konstanz.de
Hi Julian,

it compiles just fine now with the old zlib and gcc.

Zophonías

AFK

unread,
Mar 27, 2014, 4:22:31 AM3/27/14
to stacks...@googlegroups.com, Julian Catchen, andrea...@uni-konstanz.de
Hi Julian,

same here. Everything is fine now.

Thanks!
Reply all
Reply to author
Forward
0 new messages