Hi to all!
I am just starting with msprime to simulate tree genomes! We specified the demographic model, but the largest issue up to now has been simulation time. One chromosome is 2e09, there are 12 chromosomes. I am currently running just one chromosome with the following specifications:
chromosomeSize = 2.5e09
recombinationRate = 7.5e-10
mutationRate = 1.5e-13
I am including only two demographic events - starting with a huge initial effective population size 550,000, then simulating two bottleneck events shrinking the effective population size down to 500 and later to 100.
# Set present population parameters
population_configuration = [msprime.PopulationConfiguration(sample_size = 100,
initial_size = 100,
growth_rate = 0)]
# Specify pst demographic events - back in time (most recent first)
demographic_events = [msprime.PopulationParametersChange(time = 5,
initial_size = 100,
growth_rate = -0.805),
msprime.PopulationParametersChange(time = 7,
initial_size=550000,
growth_rate=0)]
And then running the simulation with:
pop = msprime.simulate(length = chromosomeSize, recombination_rate = recombinationRate, mutation_rate = mutationRate, Ne = 100,
random_seed = randint(1, (2**32-1)),
population_configurations = population_configuration, demographic_events = demographic_events)
So I wanted to ask - approximately how much time is such simulation supposed to run? And also, whether the simulation code can be optimized to be more efficient or maybe simulate the data faster?
Thanks!