biogo/hts scalability for multiple threads

Manolis Maragkakis

unread,

Apr 27, 2021, 11:46:20 AM4/27/21

to biogo-user

Hello,

We are building a tool for an SQL interface to sam/bam files https://www.biorxiv.org/content/10.1101/2021.02.03.429524v1 and we are seeing poor scalability above ~4 threads for bam, similar to what is seen here. I'm wondering what is the fundamental reason that a plateau is reached so early in biogo/hts. Intuitively I would expect the disk IO to define the plateau but samtools seems to scale much better but maybe I'm missing something. Any ideas?

Thanks

Dan Kortschak

unread,

Apr 27, 2021, 5:48:10 PM4/27/21

to biogo...@googlegroups.com

Hi,

To see where that's coming from it would be worth looking into
synchronisation traces. This can be done by adding runtime/trace set up
(https://golang.org/pkg/runtime/trace/#example_) and using the trace
tool (https://golang.org/cmd/trace/). This would give at least a
starting point to understand where things are happening.

I did do a lot of CPU profiling when the library was written to get CPU
usage down as much a possible and from memory the major component was
GZIP decompression costs. This is unlikely to have changed, so looking
at ways to reduce that would probably be a worthwhile avenue.

Dan

Manolis Maragkakis

unread,

Apr 27, 2021, 7:33:02 PM4/27/21

to biogo-user

Thanks Dan,

If I understand this correctly, GZIP decompression is happening in goroutines so I expected that part to scale well with more workers. What we observe is, that regardless of how many threads are given to the reader the program seems to plateau around ~400% CPU usage indicating that it is not distributing the compute fast enough to take advantage of the resources. I'm trying to think what could conceptually be the bottleneck in this. Could it be the way the channels are setup in hts/bgzf? Sorry if I'm thinking this superficially as I haven't dived in all the code details.

Dan Kortschak

unread,

Apr 27, 2021, 8:13:36 PM4/27/21

to Manolis Maragkakis, biogo-user

On Tue, 2021-04-27 at 16:33 -0700, Manolis Maragkakis wrote:
> If I understand this correctly, GZIP decompression is happening in
> goroutines so I expected that part to scale well with more workers.
> What we observe is, that regardless of how many threads are given to
> the reader the program seems to plateau around ~400% CPU usage
> indicating that it is not distributing the compute fast enough to
> take advantage of the resources. I'm trying to think what could
> conceptually be the bottleneck in this. Could it be the way the
> channels are setup in hts/bgzf? Sorry if I'm thinking this
> superficially as I haven't dived in all the code details.

Yeah, if there are contended shared resources in gzip reading (...
there aren't AFAICS) then you would see this behaviour. The only way to
get at it is to instrument the reader in flagstat with runtime/trace
and see where synchronisation is costing time (and maybe elsewhere).

Before that's done, there's not really any point speculating.

Manolis Maragkakis

unread,

Apr 28, 2021, 12:07:57 PM4/28/21

to biogo-user

I see. Thanks for the feedback.

Reply all

Reply to author

Forward