Obtaining ancestral/derived state in VCF output

33 views
Skip to first unread message

JMR JMR

unread,
May 11, 2021, 8:52:57 AM5/11/21
to msprime-users
Dear all,

I would like to know if it's possible to obtain the ancestral/derived status from an ms-prime VCF output file. Can I safely assume the 0 allele is the ancestral allele?

This is the model I am simulating:

import sys,msprime,gzip
from math import exp

nhafr = 200 # num. sampled African haplotypes
nheur = 800 # num. sampled European haplotypes
nhnea = 200 # num. sampled Neand. haplotypes
nbp = 10000000 # simulated 10 Mb
rho = 1.0e-8 # recombination rate
mu = 2.4e-8 # mutation rate of base simulation

seed = int(sys.argv[1]) # we used seeds 1-100
outprefix = sys.argv[2] # output file prefix

NNE = 1500 # Neandertal effective size
NA = 15000 # Ancestral African eff. size
NOoA = 2000 # Out-of-African eff. size
r = .02 # growth rate since agriculture

# times are in generations before present
TNeand = 16000 # neandertal-human split time
TOoA = 2400 # time of Out of Africa event
Tintrostart = 2000 # start of introgression
Tintroend = Tintrostart-20 # end introgression
Tgrowth = 200 # time of recent growth

# migration rates are proportion of population made of new immigrants each generation
mAFEU = 1.0e-5 # mig. rate betw. Afr. and Eur.
mintro = 0.0015 # introgression from Neandertal into Europe

NEUnow = NOoA*exp(Tgrowth*r)
NAFnow = NA*exp(Tgrowth*r)

pop_config = [
msprime.PopulationConfiguration(sample_size = None, initial_size = NAFnow,growth_rate = r),
msprime.PopulationConfiguration(sample_size = None, initial_size = NEUnow, growth_rate = r),
msprime.PopulationConfiguration(sample_size = None, initial_size = NNE)]

samples = [msprime.Sample(0,0) for i in range(nhafr)] + [msprime.Sample(1,0) for i in range(nheur)]
+ [msprime.Sample(2,Tintrostart) for i in range(nhnea)] # Neand. haplotypes sampled 2000 gen. ago  

# migration between Africa and Europe
mig_mat = [[0,mAFEU,0],[mAFEU,0,0],[0,0,0]]

# onset of growth with agriculture
growth_event = [msprime.PopulationParametersChange(time = Tgrowth, growth_rate = 0.0)]

# neanderthal introgression
intro_event = [msprime.MigrationRateChange(time = Tintroend, rate = mintro, matrix_index = (1,2)),
msprime.MigrationRateChange(time = Tintrostart, rate = 0, matrix_index = (1,2))]

# human out of Africa
ooa_event = [msprime.MigrationRateChange(time=TOoA-.01, rate=0), msprime.MassMigration(time=TOoA,source=1, destination=0)]

# Neanderthal out of Africa
neand_event = [msprime.MassMigration(time=TNeand, source=2, destination=0)]

events = growth_event + intro_event + ooa_event + neand_event

# make the simulation
treeseq = msprime.simulate(population_configurations = pop_config, samples = samples,
migration_matrix = mig_mat, demographic_events = events, length = nbp, recombination_rate =
rho,mutation_rate = mu, random_seed = seed)

# output the vcf
with gzip.open(outprefix+".vcf.gz","w") as vcffile:
  treeseq.write_vcf(vcffile,2)  

Thanks,

Javier

Jerome Kelleher

unread,
May 11, 2021, 9:31:57 AM5/11/21
to msprim...@googlegroups.com
Hi Javier,

Yes, in this simulation you're guaranteed that 0 is the ancestral state
and 1 the derived. More generally, the REF allele is always the
ancestral state, in tskit output.

Hope this helps,
Jerome
> --
> You received this message because you are subscribed to the Google
> Groups "msprime-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to msprime-user...@googlegroups.com
> <mailto:msprime-user...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/msprime-users/a76f2e01-4ebf-4309-baf8-9a1a67c7a60an%40googlegroups.com
> <https://groups.google.com/d/msgid/msprime-users/a76f2e01-4ebf-4309-baf8-9a1a67c7a60an%40googlegroups.com?utm_medium=email&utm_source=footer>.

JMR JMR

unread,
May 11, 2021, 9:50:19 AM5/11/21
to msprime-users
Great, that's good to know.

Thanks for the very quick reply.

Best,

Javier

Jerome Kelleher

unread,
May 11, 2021, 10:04:17 AM5/11/21
to msprim...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages