Simulating HIV sequences

13 views
Skip to first unread message

Anushka Jain

unread,
Jun 3, 2024, 5:59:25 AMJun 3
to slim-discuss
Hi Everyone,
Thank you to the moderators for having me. I have some questions that I am looking for help with.

I am trying to run a simulation with SLiM to create a .fasta file containing simulated sequences of HIV from patients. My aim is to generate random sequences and store them in a .fasta file. I am also trying to include replicates in my simulated set of sequences to make the simulation closer to real data. I have written some code to be able to do that. But I am facing issues with minor things in code that I am finding difficult to address. Please help me out here. I will need to run the .fasta file it is generating in MEGAX to create a phylogenetic tree. I also want to store the .fasta file in my C: drive. Can you help with specifying that file path as well?

Here is my code.
initialize() {
    // Define simulation parameters
    defineConstant("genomeLength", 9000); // Simplified genome length
    defineConstant("popSize", 107); // Initial population size
    defineConstant("mutationRate", 1e-4); // High mutation rate
    defineConstant("recombinationRate", 1e-5); // Recombination rate
    defineConstant("selectionCoefficient", 0.1); // Selection coefficient for beneficial mutations
    defineConstant("bases", c("A", "T", "C", "G")); // Possible bases
    defineConstant("numReplicates", 3); // Number of replicates per individual

    // Initialize mutation rate, types, and genomic elements
    initializeMutationRate(mutationRate);
    initializeMutationType("m1", 0.5, "f", 0.0); // Neutral mutations
    initializeMutationType("m2", 0.5, "f", selectionCoefficient); // Beneficial mutations
    initializeGenomicElementType("g1", c("m1", "m2"), c(9, 1)); // 90% neutral, 10% beneficial
    initializeGenomicElement("g1", 0, genomeLength - 1);
    initializeRecombinationRate(recombinationRate);
}

1 early() {
    sim.addSubpop("p1", popSize);
}

10000 late() {
    string filePath = "simulation_output.fasta";
    integer sequenceLength = 9000; // Adjusted to full genome length for realistic simulation
    string allIndividualData = "";
   
    for (individual in p1.individuals) {
        for (integer replicate = 1; replicate <= numReplicates; replicate++) {
            string sequence = generateInitialSequence(sequenceLength); // Generate a random initial sequence
            // Mutate sequence
            sequence = mutateSequence(sequence, individual.genomes);
            string header = ">Individual_" + (individual.index + 1) + "_Replicate_" + replicate + "\n";
            string individualData = header + sequence + "\n";
            allIndividualData = concat(allIndividualData, individualData);
        }
    }
    writeFile(filePath, allIndividualData);
    sim.simulationFinished();
}

// Generate a random initial sequence of the specified length
function string generateInitialSequence(integer length) {
    string sequence = "";
    for (integer i = 0; i < length; i++) {
        sequence = concat(sequence, sample(bases, 1, TRUE));
    }
    return sequence;
}

// Apply mutations to a sequence based on the individual's genomes
function string mutateSequence(string sequence, object genomes) {
    for (integer i = 0; i < length(sequence); i++) {
        if (runif(1) < mutationRate) {
            sequence[i] = sample(bases, 1, TRUE); // Mutate to a new random base
        }
    }
    return sequence;
}

Any inputs are highly appreciated here. Thank you!
Sincerely,
Anushka Jain

Ben Haller

unread,
Jun 3, 2024, 6:58:18 AMJun 3
to Anushka Jain, slim-discuss
Hi Anushka!

So I guess there are two questions here?  One is about running MEGAX on your generated FASTA files; I don't know anything about that.  Perhaps someone else here does, but I'd look at the doc for that software, or look for a help group for it.  I've never even heard of it before.  :->  The other is how to specify a C: path when running on Windows, I guess.  I'm not a Windows user, but I've maybe got two suggestions for that.  (1) If you change to the directory you want to write to, in your command-line terminal app, and then run `slim` from that directory, then that should be the current working directory for your `slim` run, and files saved with a relative path (e.g., "myfile.fasta") should be written there.  (2) If you want to write to an absolute path (e.g., starting with "C:"), try dragging the folder you want to write to into your script in SLiMgui; I think that inserts the path to that folder into your script, which you can then use directly.  I've never understood how paths are represented on Windows, but that ought to work.  :->  Sorry for the less-than-helpful replies; maybe someone else will say more.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Anushka Jain wrote on 6/3/24 11:59 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/32a9ece9-226b-4d69-be47-ada5db4ea14cn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages