Absolute divergence dxy

1,508 views
Skip to first unread message

jacques toto

unread,
Oct 12, 2014, 2:50:18 PM10/12/14
to stacks...@googlegroups.com

Dear Julian and Stacks users,

I am wondering if there are some prospects to provide estimates of per locus dxy (absolute divergence) (Nei, 1987) between populations in Stacks in the next releases, or if someone could have some scripts to share for doing such tasks.

I know the popoolation package can help but I am not doing poolseq and I have performed all other analyses so far with Stacks.

On the other hand, would it be possible to have the kernel smoothing on the overall pi in the batch.fst.tsv ?

I have also realized that with the Stacks versions 1.20 and 1.21 the smoothed Fis values look highly negative, up to -6. Maybe, I am doing something wrong in my procedure?

Many thanks for your help. 


Jacques

 

Julian Catchen

unread,
Oct 13, 2014, 6:23:26 PM10/13/14
to stacks...@googlegroups.com
Hi Jacques,

Can you provide a specific link to the Nei calculation you are referring
to? We provide several divergence statistics in our haplotype analysis
(batch_X.hapstats.tsv) but I need to know exactly which calculation you
are referring to.

If you want kernel smoothing on the overall Pi measure, simply create a
new population map that includes the two subpopulations as a single
population and use the standard calculation of Pi.

There was a bug in Stacks 1.20 where Fis could be listed as -7. This is
not a true calculated value but is instead a method to hold state
internally in the code for kernel-smoothing purposes. This should never
be exposed to the users of the code. It should be fixed in 1.21 and
should not appear. If you do see it in 1.21, please provide me the
details. At the same time, these values can safely be interpreted as 0,
which is their true value as calculated internally.

Best,

julian

jacques toto

unread,
Oct 13, 2014, 8:50:55 PM10/13/14
to stacks...@googlegroups.com, jcat...@uoregon.edu

Dear Julian,


Thank you very much for your prompt reply.

The dxy formula is given by the equation (18) in Nei and Li, Mol. Biol. Evol. 6(3):290-300. 1989. The statistic is also described with great details in box 1 in the recent paper of Cruickshank and Hann, Molecular Ecology (2014) 23, 3133–3157.

 

Thank you again for providing many robust nucleotide and haplotype-based statistics in Stacks. I have used them, and they are both informative and complementary. However, it would perhaps be nice in some cases to have the possibility to also test both relative and absolute measures of divergence as emphasized by some authors ((i)Cruickshank and Hann, Molecular Ecology (2014) 23, 3133–3157; (ii)A. F. Noor and M. Bennett, Heredity. 2009,103(6): 439–444; and (iii)Charlesworth, Mol. Biol. Evol. 15(5):538–543. 1998, for instance).

 

Please find below a sample of the batch_x.sumstats.tsv. I have made a test on 9 individuals with Stacks version 1.21.

 

# Batch ID 

Locus ID

Chr

BP

Col

Pop ID

P Nuc

Q Nuc

N

P

Obs Het

Obs Hom

Exp Het

Exp Hom

Pi

Smoothed Pi

Smoothed Pi P-value

Fis

Smoothed Fis

Smoothed Fis P-value

Private

1

2549

1

19267

21

1

G

A

7

0.92857143

0.1429

0.8571

0.1327

0.8673

0.1429

0.0021

0.0000

0.0000

-6.9535

0.0000

0

1

2826

1

20545

5

1

C

T

4

0.87500000

0.2500

0.7500

0.2188

0.7812

0.2500

0.0021

0.0000

0.0000

-6.9534

0.0000

0

1

2845

1

20706

24

1

G

T

6

0.91666667

0.1667

0.8333

0.1528

0.8472

0.1667

0.0021

0.0000

-0.0000

-6.9534

0.0000

0

1

4403

1

26643

22

1

A

C

9

0.94444444

0.1111

0.8889

0.1049

0.8951

0.1111

0.0021

0.0000

0.0000

-6.9534

0.0000

0

1

4522

1

27052

36

1

C

T

6

0.83333333

0.3333

0.6667

0.2778

0.7222

0.3030

0.0021

0.0000

-0.1000

-6.9534

0.0000

0

1

5270

1

29975

43

1

T

C

6

0.83333333

0.3333

0.6667

0.2778

0.7222

0.3030

0.0021

0.0000

-0.1000

-6.9534

0.0000

0

1

6951

1

36420

59

1

C

A

3

0.66666667

0.6667

0.3333

0.4444

0.5556

0.5333

0.0021

0.0000

-0.2500

-6.9533

0.0000

0

1

10982

1

57550

4

1

T

C

9

0.50000000

1.0000

0.0000

0.5000

0.5000

0.5294

0.0022

0.0000

-0.8889

-6.9531

0.0000

0

1

232

1

108495

10

1

G

A

5

0.70000000

0.6000

0.4000

0.4200

0.5800

0.4667

0.0023

0.0000

-0.2857

-6.9525

0.0000

0

1

2133

1

175702

8

1

T

G

5

0.90000000

0.2000

0.8000

0.1800

0.8200

0.2000

0.0024

0.0000

-0.0000

-6.9515

0.0000

0

1

3016

1

214752

29

1

G

A

6

0.66666667

0.6667

0.3333

0.4444

0.5556

0.4848

0.0025

0.0000

-0.3750

-6.9509

0.0000

0

1

3053

1

216420

14

1

G

A

5

0.80000000

0.4000

0.6000

0.3200

0.6800

0.3556

0.0025

0.0000

-0.1250

-6.9508

0.0000

0

1

3057

1

216720

4

1

C

T

3

0.83333333

0.3333

0.6667

0.2778

0.7222

0.3333

0.0025

0.0000

0.0000

-6.9508

0.0000

0

1

3608

1

236517

6

1

C

T

6

0.83333333

0.3333

0.6667

0.2778

0.7222

0.3030

0.0025

0.0000

-0.1000

-6.9505

0.0000

0

1

3658

1

238123

80

1

T

C

5

0.90000000

0.2000

0.8000

0.1800

0.8200

0.2000

0.0025

0.0000

-0.0000

-6.9505

0.0000

0

1

4461

1

268030

16

1

G

A

9

0.88888889

0.2222

0.7778

0.1975

0.8025

0.2092

0.0025

0.0000

-0.0625

-6.9501

0.0000

0

1

4706

1

277977

24

1

G

A

6

0.75000000

0.5000

0.5000

0.3750

0.6250

0.4091

0.0025

0.0000

-0.2222

-6.9500

0.0000

0

1

4735

1

279117

31

1

C

A

7

0.50000000

1.0000

0.0000

0.5000

0.5000

0.5385

0.0025

0.0000

-0.8571

-6.9500

0.0000

0

1

4761

1

280065

19

1

T

A

6

0.83333333

0.3333

0.6667

0.2778

0.7222

0.3030

0.0025

0.0000

-0.1000

-6.9500

0.0000

0

 

 

Best,

CGO

unread,
Jul 28, 2016, 10:29:27 AM7/28/16
to Stacks, jcat...@uoregon.edu
Jaques (and Julian),

Have you figured out a way to calculate pairwise Dxy between populations based on RAD tag haplotypes, or has this been an update to Stacks to include this (I've been using v1.29)? 

I'm working with a highly selfing species, some in areas with a likely history of population bottlenecks. Consequently there is very little variation (and high homozygosity) within many populations, and pairwise Fst values are strongly influenced by the amount of variation within populations. For example, calculated Fst values between two northern Scandinavian populations are nearly 1 and greatly exceed Fst values between northern and southern Europe - as you can imagine the NJT based on pairwise FST is uninterpretable. I am looking for an alternative measure of pairwise distance and/or divergence that is less problematic when variation within populations is low and/or highly variable, and pairwise Dxy seems like a good solution for reasons cited above.

Thanks in advance,

Chris
Reply all
Reply to author
Forward
0 new messages