sequenza.extract error

513 views
Skip to first unread message

huilei xu

unread,
Aug 30, 2017, 9:47:58 AM8/30/17
to Sequenza User Group
Hi,

I have tested both on 2.1.2 and 2.1.0 version, which gave me the same error msg:

test <- sequenza.extract('my.seqz.gz')
Processing 1: Error in data.frame(base.count = as.integer(n.base.mut), maj.base.freq = as.numeric(max.freqs[, : arguments imply differing number of rows: 1962, 235

I think the error comes from mut.fractions step that generates unequal length of n.base.mut vs max.freqs due to isgnoring . NA rows. My original bam files are sorted.recalibrated bam and it's using human genome hs37d5. Thank you.

Francesco Favero

unread,
Aug 30, 2017, 9:52:09 AM8/30/17
to huilei xu, Sequenza User Group
Hi,

I’ve actually just pushed a commit to fix it in the cleanup branch.

Can you try to install the cleanup branch and see if the bug is resolved?

library(devtools)
install_bitbucket("sequenza_tools/sequenza@cleanup")


Thanks

Francesco


--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

huilei xu

unread,
Aug 30, 2017, 1:53:30 PM8/30/17
to Francesco Favero, Sequenza User Group
Hi Francesco,

The updated cleanup version works so far (It is now processing chr #9). One irrelevant question: is there a way in the sequenza R package to parallel processing different chromosome and combine the results somehow to boost the speed? My goal is purity estimation only (no need to get CNV). Thank you.

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Francesco Favero

unread,
Aug 31, 2017, 9:23:08 AM8/31/17
to huilei xu, Sequenza User Group
Hi,

I’ve seen you also opened an issue about the parallel processing.
I think it’s possible, however it may be too convoluted to include the support in the package.

Basically, you could easily run sequenza.extract for a single chromosome in parallel, then run for each object sequenza.fit, separately.
Each sequenza.fit instance will give a separate CP object
The CP object, is a list with the info:
    ploidy: A numeric vector of tested ploidy values
    cellularity: A numeric vector of tested cellularity values
    lpp: Anumeric matrix of log-posterior probability for each (ploidy, cellularity) pair.

Be aware that sequenza.fit may fail if there are no segments to feed the model, so you may setup some error checking try() to fallback to a default CP object, with all 0s in the lpp matrix (or some other approach you may end up with).


The lpp of all the resulting CP objects can be sum to get the “final” CP object, which would equivalent to the CP object as you had run the program sequentially.

It may be a little overhead, but it should work :). I’m not expert on how to parallelize this, I would do something like:

library(sequenza)
library(parallel)

data.file <- system.file("data", "example.seqz.txt.gz",
    package = "sequenza")

chromosome <- c(1:22, c("X", "Y"))

n_cpu = 4 # or any other amount of parallel processes you want

sample_stats <- gc.sample.stats(data.file)

extract_chroms <- mclapply(chromosome, FUN = function(x, seqz_file, gc_stats) {
        sequenza.extract(seqz_file, chromosome.list = x, gc.stats= gc_stats)
    },
    seqz_file = data.file, gc_stats = sample_stats, mc.cores = n_cpu)


CP_chroms <- mclapply(extract_chroms, FUN = function(x) {
 sequenza.fit(x, mc.cores = 1) # adjust mc.cores, beware that each process
                               # will try to use the number of cpu you set
                               # here so it will fork exponentially
}, mc.cores = n_cpu)

CP_chroms_lpp <- sapply(1:length(CP_chroms), FUN = function(x, y) y[[x]]["lpp"], y = CP_chroms)
CP_chroms_lpp[is.na(CP_chroms_lpp)] <- NULL
CP_lpp <- Reduce("+", CP_chroms_lpp)

CP <- list(cellularity = CP_chroms[[1]]$cellularity, ploidy = CP_chroms[[1]]$ploidy, lpp = CP_lpp)
cp.plot(CP)
cp.plot.contours(CP, add = T)

res <- get.ci(CP)

res$max.cellularity

res$max.ploidy


Francesco

huilei xu

unread,
Aug 31, 2017, 9:30:36 AM8/31/17
to Francesco Favero, Sequenza User Group
Thank you, I will tweak your code and give it a try, good to know the estimation result won't be affected whether taking the whole data or breaking down by chromosome. 

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsubscribe...@googlegroups.com.

Francesco Favero

unread,
Aug 31, 2017, 9:42:30 AM8/31/17
to huilei xu, Sequenza User Group
I do not guarantee it will not effected the estimation at all, it actually give a slightly different CP image, but the preferred solution is the same - in the test I’ve run -

Francesco

huilei xu

unread,
Sep 5, 2017, 8:24:37 AM9/5/17
to Francesco Favero, Sequenza User Group
Hi Francesco,

I have tested one file using both parallel or single mode and the results are consistent between each other, however the results are not the same as I expected (third party test). I am wondering does it have anything to do with the input BAM: I am using duplicate-removed, re-calibrated BAM files, does it matter? Thanks.

Huilei

Christoffer Flensburg

unread,
Nov 23, 2017, 12:08:08 AM11/23/17
to Sequenza User Group
Hey both,

Just an FYI that I had the same error, and the below code resolved it, thanks.

cheers,
/Christoffer
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

jianq...@gmail.com

unread,
Oct 10, 2018, 1:55:06 AM10/10/18
to Sequenza User Group
hi francesco,
I had the same problem,but the solution can not work:

> install_bitbucket("sequenza_tools/sequenza@cleanup")
Downloading bitbucket repo sequenza_tools/sequenza@cleanup
Installation failed: Not Found (HTTP 404).



在 2017年8月30日星期三 UTC+8下午9:52:09,Francesco Favero写道:
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Marco van het Hoog

unread,
Oct 18, 2018, 9:32:17 AM10/18/18
to Sequenza User Group
I have the same Not Found (HTTP 404) problem. It seems the entire bitbucket sequenza_tools site is gone.

Francesco Favero

unread,
Oct 18, 2018, 10:34:15 AM10/18/18
to Marco van het Hoog, Sequenza User Group
I know :P, I’ve announced the change in the develop page name few days ago in the group

The new profile is at 


Reason for the change is that with the “_” character in the group name it was impossible to have a page for the project hosted at bitbucket cloud.

As a results now we have one: the project web page is now available here:


Francesco

On 18 Oct 2018, at 15.32, Marco van het Hoog <marc...@gmail.com> wrote:

I have the same Not Found (HTTP 404) problem. It seems the entire bitbucket sequenza_tools site is gone.

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages