On Tue, 2021-04-27 at 16:33 -0700, Manolis Maragkakis wrote:
> If I understand this correctly, GZIP decompression is happening in
> goroutines so I expected that part to scale well with more workers.
> What we observe is, that regardless of how many threads are given to
> the reader the program seems to plateau around ~400% CPU usage
> indicating that it is not distributing the compute fast enough to
> take advantage of the resources. I'm trying to think what could
> conceptually be the bottleneck in this. Could it be the way the
> channels are setup in hts/bgzf? Sorry if I'm thinking this
> superficially as I haven't dived in all the code details.
Yeah, if there are contended shared resources in gzip reading (...
there aren't AFAICS) then you would see this behaviour. The only way to
get at it is to instrument the reader in flagstat with runtime/trace
and see where synchronisation is costing time (and maybe elsewhere).
Before that's done, there's not really any point speculating.