memory usage of bwa mem

14 views

Skip to first unread message

Ro Bert

unread,

Aug 8, 2023, 7:09:09 AM8/8/23

to NGLess

in my functional annotation pipeline, I'm receiving following error:

[Thu 03-08-2023 14:39:24] Line 17: Will run process /fast/AG_Forslund/rob/Programs/miniconda_VM2/miniconda3/envs/ngless/bin/bwa mem -t 10 -K 100000000 -p -a /fast/AG_Forslund/shared/NGLESSmodules/Modules/gmgc.ngm/1.0/cached/gmgc.fna.splits_200000m.0-bwa-0.7.17.fna -

[Thu 03-08-2023 16:32:57] Line 23: fd:50: hPutBuf: resource vanished (Broken pipe)

During the bwa mem command, I observed, that the memory usage is constantly rising until it is exceeding the available memory of 500G.

In my script I specify following options: gmgc_mapped = map (non_human_reads, reference='gmgc',mode_all=True,block_size_megabases=200000)

which should limit the used memory. But I also see that when I change block_size_megabases, this doesn't affect the value of the -K parameter in the bwa mem command. Shouldn't block_size_megabases translate to -K or am I misunderstanding something here?

Renato Alves

unread,

Aug 8, 2023, 7:40:51 AM8/8/23

to ngl...@googlegroups.com

Hi Robert,

The block_size_megabases and the -K option are related in purpose but do different things.

block_size_megabases splits the FASTA reference into chunks of the specified size, in megabases.
-K controls how many bases are processed by bwa, in bases, making it independent of the number of threads.

In your command you specified a value of 200 Gigabases, which is still decently large.
The default -K value is 100 Megabases, which is identical to the default value used by BWA when running with 10 threads.
See: https://github.com/CCDG/Pipeline-Standardization/issues/2 for additional context about -K

Note that changing -K would also affect the results, in yet another way that is different from splitting the reference via block_size_megabases.
These options essentially split/chunk the data. The impact comes from how they affect the statistics of what is considered the "best hit" for any given sequence.

Hope this helps,
Renato

> --
> You received this message because you are subscribed to the Google Groups "NGLess" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com <mailto:ngless+un...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/772ba0b1-1ecc-4cbc-9b28-c492f8e3baa1n%40googlegroups.com <https://groups.google.com/d/msgid/ngless/772ba0b1-1ecc-4cbc-9b28-c492f8e3baa1n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Dr. Renato Alves (PhD),
EMBL Bio-IT community & project manager - https://bio-it.embl.de | Staff Association Representative
EMBL Heidelberg, Germany

ORCID: 0000-0002-7212-0234
Github/Gitlab: @unode | 🐦 @renato_alvs | 🦣 @renato...@mstdn.science

Reply all

Reply to author

Forward

0 new messages