Hi,
I’ve seen you also opened an issue about the parallel processing.
I think it’s possible, however it may be too convoluted to include the support in the package.
Basically, you could easily run sequenza.extract for a single chromosome in parallel, then run for each object sequenza.fit, separately.
Each sequenza.fit instance will give a separate CP object
The CP object, is a list with the info:
ploidy: A numeric vector of tested ploidy values
cellularity: A numeric vector of tested cellularity values
lpp: Anumeric matrix of log-posterior probability for each (ploidy, cellularity) pair.
Be aware that sequenza.fit may fail if there are no segments to feed the model, so you may setup some error checking try() to fallback to a default CP object, with all 0s in the lpp matrix (or some other approach you may end up with).
The lpp of all the resulting CP objects can be sum to get the “final” CP object, which would equivalent to the CP object as you had run the program sequentially.
It may be a little overhead, but it should work :). I’m not expert on how to parallelize this, I would do something like:
data.file <- system.file("data", "example.seqz.txt.gz",
chromosome <- c(1:22, c("X", "Y"))
n_cpu = 4 # or any other amount of parallel processes you want
sample_stats <- gc.sample.stats(data.file)
extract_chroms <- mclapply(chromosome, FUN = function(x, seqz_file, gc_stats) {
sequenza.extract(seqz_file, chromosome.list = x, gc.stats= gc_stats)
seqz_file = data.file, gc_stats = sample_stats, mc.cores = n_cpu)
CP_chroms <- mclapply(extract_chroms, FUN = function(x) {
sequenza.fit(x, mc.cores = 1) # adjust mc.cores, beware that each process
# will try to use the number of cpu you set
# here so it will fork exponentially
CP_chroms_lpp <- sapply(1:length(CP_chroms), FUN = function(x, y) y[[x]]["lpp"], y = CP_chroms)
CP_chroms_lpp[is.na(CP_chroms_lpp)] <- NULL CP_lpp <- Reduce("+", CP_chroms_lpp)
CP <- list(cellularity = CP_chroms[[1]]$cellularity, ploidy = CP_chroms[[1]]$ploidy, lpp = CP_lpp)
cp.plot.contours(CP, add = T)
res$max.cellularity
res$max.ploidy
Francesco