Memory error (paste exceeding 2^31-1 bytes)

Davide Cittaro

unread,

May 26, 2024, 11:44:18 AM5/26/24

to Sequenza User Group

Hello, I'm trying to run sequenza on a dataset, the seqz fie is 1.8 Gb. The sequenza.extract function crashes after being called like

seq.extr = sequenza.extract(data.file, chromosome.list=chromosomes, normalization.method='median',

min.reads=50,

mufreq.treshold = 0.05,

min.reads.normal=20,

min.reads.baf=10,

kmin=5,

max.mut.types = 3,

parallel = 2

)

with this error

Processing chr1:

Error in paste(mstrsplit(res), collapse = "\n") :

result would exceed 2^31-1 bytes

Calls: sequenza.extract ... read_tsv -> <Anonymous> -> standardise_path -> paste

Execution halted

> sessionInfo()

R version 4.2.3 (2023-03-15)

Platform: x86_64-conda-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)

Matrix products: default

BLAS/LAPACK: /home/cittaro.davide/miniforge3/envs/cnvpytor/lib/libopenblasp-r0.3.27.so

locale:

[1] LC_CTYPE=en_US.utf-8 LC_NUMERIC=C

[3] LC_TIME=en_US.utf-8 LC_COLLATE=en_US.utf-8

[5] LC_MONETARY=en_US.utf-8 LC_MESSAGES=en_US.utf-8

[7] LC_PAPER=en_US.utf-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.utf-8 LC_IDENTIFICATION=C

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] sequenza_3.0.0

loaded via a namespace (and not attached):

[1] XVector_0.38.0 magrittr_2.0.3 zlibbioc_1.44.0

[4] GenomicRanges_1.50.2 BiocGenerics_0.44.0 hms_1.1.3

[7] IRanges_2.32.0 R6_2.5.1 rlang_1.1.3

[10] pbapply_1.7-2 fansi_1.0.6 GenomeInfoDb_1.34.9

[13] tools_4.2.3 iotools_0.3-5 parallel_4.2.3

[16] squash_1.0.9 utf8_1.2.4 cli_3.6.2

[19] copynumber_1.29.0.9000 tibble_3.2.1 lifecycle_1.0.4

[22] GenomeInfoDbData_1.2.9 readr_2.1.5 tzdb_0.4.0

[25] bitops_1.0-7 vctrs_0.6.5 S4Vectors_0.36.2

[28] RCurl_1.98-1.14 glue_1.7.0 compiler_4.2.3

[31] pillar_1.9.0 seqminer_9.4 stats4_4.2.3

[34] pkgconfig_2.0.3

Any idea how to solve this?

Davide Cittaro

unread,

May 27, 2024, 7:43:20 AM5/27/24

to Sequenza User Group

I've just realized that the seqz file only contains chromosome 1 and that the tumor and normal bam files had different chromosome orderings. I don't know if this is causing the error, in the meantime I am harmonizing data, I'll let you know

d

Davide Cittaro

unread,

Jun 2, 2024, 3:34:07 AM6/2/24

to Sequenza User Group

Everything has been fixed in preprocessing, the error is still there:

Processing chr10:

Davide Cittaro

unread,

Jun 4, 2024, 4:11:44 AM6/4/24

to Sequenza User Group

This is just to let you know that I've solved by binning the seqz file in 50bp windows

Michael Knudsen

unread,

May 15, 2025, 12:15:04 AM5/15/25

to Sequenza User Group

I have had the same issue (it happens for noisy samples), and I've located the problem to this part of the code:

read.seqz.tbi <- function(file, chr_name, col_names, col_types) {
    res <- tabix.read(file, chr_name)
    res <- read_tsv(file = paste(mstrsplit(res), collapse = "\n"),
        col_types = col_types, skip = 0, n_max = Inf,
        col_names = col_names, progress = FALSE)
}

If the seqz file is tabix-index, this code will take all output from a single chromosome as a string and provide it as input for read_tsv. For noisy samples, that string may be bigger than the maximum object size in R. The problem went away when I skipped tabix indexing of the seqz file. Then the code defaults to a more basic read.seqz function.

Reply all

Reply to author

Forward