Basic question regarding msprime VCF writing

241 views
Skip to first unread message

Keric Lamb

unread,
May 25, 2021, 6:37:49 PM5/25/21
to msprime-users
Hello all,

I have what's hopefully a pretty easy question. I'm attempting to produce a VCF file from a demographic simulation with 2 populations diverging, then later coming into secondary contact (see below). When I check the demography debugger, it appears to be working as expected. However, I'm not sure sure how to tell which individuals sort into which of the populations in the resulting VCF, which seems to just label individuals tsk_n (e.g. tsk_0, tsk_1, etc.).

With a sample size of 100 for each population, is it simply that the first 50 belong to p1 (due to diploidy) and the second to p2?

Thanks,
Keric

#simplified secondary contact model parameters
m12=0.5
m21=3.5
time_m=1000
time_d=5000
population_configurations=[
        msprime.PopulationConfiguration(sample_size = 100, initial_size=1e4),
        msprime.PopulationConfiguration(sample_size = 100, initial_size=1e4)]

demographic_events=[
        msprime.MigrationRateChange(0, rate=m12, matrix_index=(0, 1)),
        msprime.MigrationRateChange(0, rate=m21, matrix_index=(1, 0)),
        msprime.MigrationRateChange(time_m, rate= 0),
        msprime.MassMigration(time_d, source=1, dest=0, proportion=1)]

migration_matrix= np.array([[0,m12],[m21,0]])

ts = msprime.simulate(Ne=1000, length=1e4, recombination_rate=2e-8, mutation_rate=2e-8,
                      population_configurations=population_configurations,
                      demographic_events=demographic_events,
                      migration_matrix=migration_matrix)

dp = msprime.DemographyDebugger(
        Ne=1000,
        population_configurations=population_configurations,
        migration_matrix=migration_matrix,
        demographic_events=demographic_events)
dp.print_history()

with open("output.vcf", "w") as vcf_file:
    ts.write_vcf(vcf_file, 2)


LaKisha David

unread,
Mar 9, 2022, 11:30:34 PM3/9/22
to msprime-users
I had the same question. I wrote a script to create a pandas dataframe to address this question. I assumed that tsk_0 in the VCF file is individual 0 in the tree sequence, etc. 

# Get the population paramters
for item in mts.populations():
    print(item)

# Create the dataframe using the population id
import pandas as pd

mts_data = []
mts_df = pd.DataFrame()

for item in mts.nodes():
    if item.individual != -1:
        if item.population == 0:
            population = 0
            pop = "AFR"
        elif item.population == 3:
            population = 3
            pop = "ADMIX"
        mts_data.append(
            {
                'node': item.id,
                'individual': item.individual,
                'population': population,
                'pop': pop
            }
        )
    else:
        break

mts_df = pd.DataFrame(mts_data)
mts_df.set_index("node", inplace=True)

mts_df
Reply all
Reply to author
Forward
0 new messages