Hi everyone,
I wrote a program that scans a BAM file and noticed it consumes a lot of memory, although there should be no reason it does. I simplified it to the following:
open Core.Std
open CFStream
let () =
let open Biocaml in
let open Sam.Flags in
let update accu = function
| `alignment { Sam.flags = al } ->
if secondary_alignment al then accu
else accu + 1
| _ -> accu
in
In_channel.with_file Sys.argv.(1) ~f:(fun ic ->
Bam.in_channel_to_item_stream_exn ic
|> Stream.fold ~init:0 ~f:update
)
|> print_int
What upsets me most is that the memory footprint grows with the size of the input file. However, I'd expect this program to work in constant memory size. If I add two calls to [Gc.major] to the [update] function, then I obtain the expected memory behavior, so I'm not claiming there's a memory leak.
Have you guys met this problem before? I'm not sure this is related to transform alone or their interaction with streams or something else. I guess I could tweak the GC parameters, but maybe there is a more elegant way to fix the issue. In particular, this style should be supported out-of-the-box IMHO.
ph.