How to deal with running out of memory during bootstrapping?

9 views
Skip to first unread message

Nick englewrye

unread,
Oct 9, 2024, 3:33:04 PM10/9/24
to PhyloNetworks users

Hello everyone,


I am currently having issues with running out of memory while trying to bootstrap. I was wondering if anyone could please look at my code and tell me how to better allocate memory or set up a loop to break up my data into something more manageable for this purpose? I’ve made an attempt at checkpointing and making smaller batch sizes for bootstrap runs, but haven’t had success. I’ve also attached screenshots of the memory available for the computer I will be doing this on. 


Thanks for any help,

Nick


#bootstraps 

#place all bootstrap files from RAxML in a folder called bootstraps and work in that with command line, not julia, to make boots.txt


ls /mnt/Symsym/nick/phylonetworks/spruceup.phylonetworks.1hybrid.no.outgroup/bootstraps/RAxML_bootstrap.Locus_* > boots.txt

cp boots.txt ../  #copy it into the working directory that julia is in 



# now in julia do the following

#read bs trees:

bootTrees = readBootstrapTrees("boots.txt")


### Julia crashed mid bootstrapping so I need to reload net1

using PhyloNetworks


# Load the network from the output file

net1 = readTopology("net1.out")


# Verify the loaded network

println(net1)


# Continue with further steps such as bootstrapping or plotting



## # Add 4 processors to speed things up # I think on the google group I saw that bootstrapping only ever uses 1 processor anyways, so skip this. 

## using Distributed

## addprocs(4)  # Adjust the number of processors as needed

## @everywhere using PhyloNetworks

## bootnet = bootsnaq(net1, bootTrees, hmax=5, nrep=100, runs=10, filename="bootsnaq1_raxmlboot")



## bootnet = bootsnaq(net1, bootTrees, hmax=5, nrep=100, runs=10, filename="bootsnaq1_raxmlboot") # this should have worked but I ran out of memory.



########## the first runs using 4 and 1 processors got killed so I tried the follwing code to checkpoints progress and free up memory but I still ran out of memory.. 


# using Distributed # this & the next 2 lines add processors but I ran out of memory so I am skipping to have only 1 processor

# addprocs(4)  # Adjust the number of processors to 4

# @everywhere using PhyloNetworks

using JLD2  # For saving and loading checkpoints

using Serialization  # Alternative if you prefer binary serialization


# Define the number of bootstrap replicates per batch (checkpoint)

nrep_total = 100

batch_size = 1  # Save every 1 replicate

hmax = 5

runs = 10

filename = "bootsnaq1_raxmlboot"


# Define checkpoint file

checkpoint_file = "bootsnaq_checkpoint.jld2"


# Load previous checkpoint if it exists

if isfile(checkpoint_file)

    @info "Loading previous checkpoint from $checkpoint_file"

    bootnet, completed_reps = JLD2.load(checkpoint_file, "bootnet", "completed_reps")

else

    @info "Starting fresh bootstrap estimation"

    bootnet = HybridNetwork()  # Initialize with your network or empty one

    completed_reps = 0

end


# Run the remaining batches

while completed_reps < nrep_total

    nrep_current = min(batch_size, nrep_total - completed_reps)

    

    # Run bootsnaq for the current batch

    bootnet = bootsnaq(net1, bootTrees, hmax=hmax, nrep=nrep_current, runs=runs, filename=filename)

    

    # Update the number of completed replicates

    completed_reps += nrep_current

    

    # Save checkpoint

    @info "Saving checkpoint after $completed_reps replicates"

    JLD2.save(checkpoint_file, "bootnet" => bootnet, "completed_reps" => completed_reps)


    # Clear the previous bootnet if not needed

    bootnet = nothing  # Release memory from the previous bootnet

    

    # Trigger garbage collection

    GC.gc()  # Run garbage collection to free up memory

end


@info "Bootstrap estimation complete."





Reply all
Reply to author
Forward
0 new messages