Memory error (paste exceeding 2^31-1 bytes)

43 views
Skip to first unread message

Davide Cittaro

unread,
May 26, 2024, 11:44:18 AM5/26/24
to Sequenza User Group
Hello, I'm trying to run sequenza on a dataset, the seqz fie is 1.8 Gb. The sequenza.extract function crashes after being called like

seq.extr = sequenza.extract(data.file,  chromosome.list=chromosomes, normalization.method='median', 

min.reads=50, 

mufreq.treshold = 0.05, 

min.reads.normal=20, 

min.reads.baf=10,

kmin=5, 

max.mut.types = 3, 

parallel = 2

)


with this error

Processing chr1:

Error in paste(mstrsplit(res), collapse = "\n") : 

  result would exceed 2^31-1 bytes

Calls: sequenza.extract ... read_tsv -> <Anonymous> -> standardise_path -> paste

Execution halted


> sessionInfo()

R version 4.2.3 (2023-03-15)

Platform: x86_64-conda-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)


Matrix products: default

BLAS/LAPACK: /home/cittaro.davide/miniforge3/envs/cnvpytor/lib/libopenblasp-r0.3.27.so


locale:

 [1] LC_CTYPE=en_US.utf-8       LC_NUMERIC=C

 [3] LC_TIME=en_US.utf-8        LC_COLLATE=en_US.utf-8

 [5] LC_MONETARY=en_US.utf-8    LC_MESSAGES=en_US.utf-8

 [7] LC_PAPER=en_US.utf-8       LC_NAME=C

 [9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.utf-8 LC_IDENTIFICATION=C

  

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base

  

other attached packages:

[1] sequenza_3.0.0

  

loaded via a namespace (and not attached):

 [1] XVector_0.38.0         magrittr_2.0.3         zlibbioc_1.44.0

 [4] GenomicRanges_1.50.2   BiocGenerics_0.44.0    hms_1.1.3

 [7] IRanges_2.32.0         R6_2.5.1               rlang_1.1.3

[10] pbapply_1.7-2          fansi_1.0.6            GenomeInfoDb_1.34.9

[13] tools_4.2.3            iotools_0.3-5          parallel_4.2.3

[16] squash_1.0.9           utf8_1.2.4             cli_3.6.2

[19] copynumber_1.29.0.9000 tibble_3.2.1           lifecycle_1.0.4

[22] GenomeInfoDbData_1.2.9 readr_2.1.5            tzdb_0.4.0

[25] bitops_1.0-7           vctrs_0.6.5            S4Vectors_0.36.2

[28] RCurl_1.98-1.14        glue_1.7.0             compiler_4.2.3

[31] pillar_1.9.0           seqminer_9.4           stats4_4.2.3

[34] pkgconfig_2.0.3



Any idea how to solve this?

Davide Cittaro

unread,
May 27, 2024, 7:43:20 AM5/27/24
to Sequenza User Group
I've just realized that the seqz file only contains chromosome 1 and that the tumor and normal bam files had different chromosome orderings. I don't know if this is causing the error, in the meantime I am harmonizing data, I'll let you know

d

Davide Cittaro

unread,
Jun 2, 2024, 3:34:07 AM6/2/24
to Sequenza User Group
Everything has been fixed in preprocessing, the error is still there:

Processing chr10:

Davide Cittaro

unread,
Jun 4, 2024, 4:11:44 AM6/4/24
to Sequenza User Group
This is just to let you know that I've solved by binning the seqz file in 50bp windows

Michael Knudsen

unread,
May 15, 2025, 12:15:04 AMMay 15
to Sequenza User Group
I have had the same issue (it happens for noisy samples), and I've located the problem to this part of the code:

read.seqz.tbi <- function(file, chr_name, col_names, col_types) {
res <- tabix.read(file, chr_name)
res <- read_tsv(file = paste(mstrsplit(res), collapse = "\n"),
col_types = col_types, skip = 0, n_max = Inf,
col_names = col_names, progress = FALSE)
}

If the seqz file is tabix-index, this code will take all output from a single chromosome as a string and provide it as input for read_tsv. For noisy samples, that string may be bigger than the maximum object size in R. The problem went away when I skipped tabix indexing of the seqz file. Then the code defaults to a more basic read.seqz function.
Reply all
Reply to author
Forward
0 new messages