Precision population name order import SNP data format pop_ids model

32 views
Skip to first unread message

Mathilde SALAMON

unread,
Aug 18, 2023, 10:48:25 AM8/18/23
to dadi-user
Hi Ryan,

I imported my data using the SNP data format and I was wondering if the columns for each populations have to be in the same order as the one specified for the pop_ids ? Or does dadi automatically recognize the names when importing the dataset ?

Also, I just wanted to confirm that the order of the parameters for the populations are the same as the one specified in the pop_ids ?

Below is an example of my SNP data format dataset and of the specified model where the population names are in the same order, but I also have a case where the order of the names in the dataset are not the same as in pop_ids.

Thank you for your help,
Best wishes,
Mathilde Salamon
--
REF ALT Allele1 PB PG Allele2 PB PG Gene Position
-A- -G- A 5 0 G 75 80 100 10597
-A- -G- A 0 17 G 80 63 100 11098
-A- -G- A 0 9 G 80 71 100 11542
-A- -G- A 80 77 G 0 3 100 11598
-A- -T- A 0 7 T 80 73 100 11615
-A- -C- A 0 13 C 80 67 100 11689
-T- -G- T 0 5 G 80 75 100 12994
-T- -G- T 0 11 G 80 69 100 13429
-T- -G- T 0 4 G 80 76 100 13456

####### Import data and generate fs #######
dd = dadi.Misc.make_data_dict("dadi_input_filtercov_PB_PG")
pop_ids, ns = ['PB', 'PG'], [80, 80] # Project down to 40 (half the chromosomes) for faster computation
fs = dadi.Spectrum.from_data_dict(dd, pop_ids, ns, polarized = False) # fs folded because the ancestral state of SNPs is unknown
# We can save our extracted spectrum to disk
fs.mask[0:5,:] = True # mask the first 5 alleles because low confidence in frequency calling
fs.mask[:,0:5] = True
fs.to_file('PG_PB.fs')


####### Define a custom demography model #######
def split_mig_bottlegrowth(params, ns, pts):
"""
Model of split with migration then instantaneous size change = bottleneck in both populations
Tb is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization
params list is

nu1: Size of population 1 after split.
nu2: Size of population 2 after split.
nu1B: Ratio of population size after instantanous change to ancient population size for pop1
nu1F: Ratio of contempoary to ancient population size pop1
nu2B: Ratio of population size after instantanous change to ancient population size for pop2
nu2F: Ratio of contempoary to ancient population size pop2
m12: The scaled migration rate from pop 2 (invaded) to pop 1 (refuge)
m21: The scaled migration rate from pop 2 to pop 1
Ts: The time between the split and bottleneck
Tb: The scaled time between the bottleneck and present.

ns = (n1,n2): Size of fs to generate.
pts: Number of points to use in grid for evaluation.
"""
nu1,nu2, nu1B, nu1F, nu2B, nu2F, mri, mir, Ts, theta = params
#RNG: Removed Tb from params list. It’s not a parameter you’re fitting in this function.
n1,n2 = ns
# RNG: Need to specify theta0 throughout function
# Define the grid we'll use
xx = yy = dadi.Numerics.default_grid(pts)

# phi for the equilibrium ancestral population
phi = dadi.PhiManip.phi_1D(xx, theta0=theta)
# phi for the ancestral population split
phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)

# phi for the migration between pop 1 and pop 2 post split
phi = dadi.Integration.two_pops(phi, xx, Ts, nu1, nu2, m12=mri, m21=mir, theta0=theta)

# Define a function to describe the bottleneck then exponential growth in populations 1 and 2
# Use lambda for this function
# RNG: Changed Tb = to T = . The error message you were getting was just about the name of that argument.
Tb = (12*2*7.6e-09*7158678)/theta
nu1_func = lambda t: nu1B*(nu1F/nu1B)**(t/Tb)
nu2_func = lambda t: nu2B*(nu2F/nu2B)**(t/Tb)
phi = dadi.Integration.two_pops(phi, xx, T = Tb, nu1=nu1_func, nu2=nu2_func,
m12=mri, m21=mir, theta0 = theta)

# Finally, calculate the spectrum.
sfs = dadi.Spectrum.from_phi(phi, (n1,n2), (xx,yy))
return sfs


Ryan Gutenkunst

unread,
Aug 19, 2023, 12:19:54 PM8/19/23
to dadi...@googlegroups.com
Hello Mathilde,

> On Aug 18, 2023, at 7:48 AM, Mathilde SALAMON <mathilde...@gmail.com> wrote:
> I imported my data using the SNP data format and I was wondering if the columns for each populations have to be in the same order as the one specified for the pop_ids ? Or does dadi automatically recognize the names when importing the dataset ?

Order is not important. Dadi is smart enough to adjust.

> Also, I just wanted to confirm that the order of the parameters for the populations are the same as the one specified in the pop_ids ?

Yes, pop 1 is always the first in pop_ids.

> Below is an example of my SNP data format dataset and of the specified model where the population names are in the same order, but I also have a case where the order of the names in the dataset are not the same as in pop_ids.

Best,
Ryan
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/7011c7b9-e500-4b73-86ff-7ccb6ca08271n%40googlegroups.com.

Mathilde SALAMON

unread,
Sep 4, 2023, 6:45:26 AM9/4/23
to dadi-user
Hello Ryan,

thank you for your answer, this was very useful.

Best wishes,
Mathilde
Reply all
Reply to author
Forward
0 new messages