Precision population name order import SNP data format pop

Mathilde SALAMON

unread,

Aug 18, 2023, 10:48:25 AM8/18/23

to dadi-user

Hi Ryan,

I imported my data using the SNP data format and I was wondering if the columns for each populations have to be in the same order as the one specified for the pop_ids ? Or does dadi automatically recognize the names when importing the dataset ?

Also, I just wanted to confirm that the order of the parameters for the populations are the same as the one specified in the pop_ids ?

Below is an example of my SNP data format dataset and of the specified model where the population names are in the same order, but I also have a case where the order of the names in the dataset are not the same as in pop_ids.

Thank you for your help,

Best wishes,

Mathilde Salamon

--

REF ALT Allele1 PB PG Allele2 PB PG Gene Position
-A- -G- A 5 0 G 75 80 100 10597
-A- -G- A 0 17 G 80 63 100 11098
-A- -G- A 0 9 G 80 71 100 11542
-A- -G- A 80 77 G 0 3 100 11598
-A- -T- A 0 7 T 80 73 100 11615
-A- -C- A 0 13 C 80 67 100 11689
-T- -G- T 0 5 G 80 75 100 12994
-T- -G- T 0 11 G 80 69 100 13429
-T- -G- T 0 4 G 80 76 100 13456

####### Import data and generate fs #######
dd = dadi.Misc.make_data_dict("dadi_input_filtercov_PB_PG")
pop_ids, ns = ['PB', 'PG'], [80, 80] # Project down to 40 (half the chromosomes) for faster computation
fs = dadi.Spectrum.from_data_dict(dd, pop_ids, ns, polarized = False) # fs folded because the ancestral state of SNPs is unknown 
# We can save our extracted spectrum to disk
fs.mask[0:5,:] = True # mask the first 5 alleles because low confidence in frequency calling
fs.mask[:,0:5] = True
fs.to_file('PG_PB.fs')


####### Define a custom demography model #######
def split_mig_bottlegrowth(params, ns, pts):
    """
    Model of split with migration then instantaneous size change = bottleneck in both populations
    Tb is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization
    
    params list is

    nu1: Size of population 1 after split.
    nu2: Size of population 2 after split.
    nu1B: Ratio of population size after instantanous change to ancient population size for pop1
    nu1F: Ratio of contempoary to ancient population size pop1
    nu2B: Ratio of population size after instantanous change to ancient population size for pop2
    nu2F: Ratio of contempoary to ancient population size pop2
    m12: The scaled migration rate from pop 2 (invaded) to pop 1 (refuge)
    m21: The scaled migration rate from pop 2 to pop 1
    Ts: The time between the split and bottleneck
    Tb: The scaled time between the bottleneck and present.

    ns = (n1,n2): Size of fs to generate.
    pts: Number of points to use in grid for evaluation.
    """
    nu1,nu2, nu1B, nu1F, nu2B, nu2F, mri, mir, Ts, theta = params
    #RNG: Removed Tb from params list. It’s not a parameter you’re fitting in this function.
    n1,n2 = ns
    # RNG: Need to specify theta0 throughout function
    # Define the grid we'll use
    xx = yy = dadi.Numerics.default_grid(pts)

    # phi for the equilibrium ancestral population
    phi = dadi.PhiManip.phi_1D(xx, theta0=theta)
    
    # phi for the ancestral population split 
    phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)

    # phi for the migration between pop 1 and pop 2 post split
    phi = dadi.Integration.two_pops(phi, xx, Ts, nu1, nu2, m12=mri, m21=mir, theta0=theta)

    # Define a function to describe the bottleneck then exponential growth in populations 1 and 2
    # Use lambda for this function
    # RNG: Changed Tb = to T = . The error message you were getting was just about the name of that argument.
    Tb = (12*2*7.6e-09*7158678)/theta
    nu1_func = lambda t: nu1B*(nu1F/nu1B)**(t/Tb)
    nu2_func = lambda t: nu2B*(nu2F/nu2B)**(t/Tb)
    phi = dadi.Integration.two_pops(phi, xx, T = Tb, nu1=nu1_func, nu2=nu2_func, 
                                    m12=mri, m21=mir, theta0 = theta)

    # Finally, calculate the spectrum.
    sfs = dadi.Spectrum.from_phi(phi, (n1,n2), (xx,yy))
    return sfs

Ryan Gutenkunst

unread,

Aug 19, 2023, 12:19:54 PM8/19/23

to dadi...@googlegroups.com

Hello Mathilde,

> On Aug 18, 2023, at 7:48 AM, Mathilde SALAMON <mathilde...@gmail.com> wrote:
> I imported my data using the SNP data format and I was wondering if the columns for each populations have to be in the same order as the one specified for the pop_ids ? Or does dadi automatically recognize the names when importing the dataset ?

Order is not important. Dadi is smart enough to adjust.

> Also, I just wanted to confirm that the order of the parameters for the populations are the same as the one specified in the pop_ids ?

Yes, pop 1 is always the first in pop_ids.

> Below is an example of my SNP data format dataset and of the specified model where the population names are in the same order, but I also have a case where the order of the names in the dataset are not the same as in pop_ids.

Best,
Ryan

> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/7011c7b9-e500-4b73-86ff-7ccb6ca08271n%40googlegroups.com.

Mathilde SALAMON

unread,

Sep 4, 2023, 6:45:26 AM9/4/23

to dadi-user

Hello Ryan,

thank you for your answer, this was very useful.

Best wishes,

Mathilde

Reply all

Reply to author

Forward

Precision population name order import SNP data format pop_ids model

Mathilde SALAMON

Ryan Gutenkunst

Mathilde SALAMON