On Tue, Nov 14, 2017 at 07:05:49AM -0800, Brad Chapman wrote:
> Great news that you have time and support for working on
> sambamba. For us the number one help would be improving stability
> of the multicore steps. We're still stuck with sambamba
> segfaulting on multiple platforms and ended up working around it
> by transferring functionality to either samtools (which has a lot
> more multicore now) or mosdepth (which is very fast and flexible
> for depth calculations).
The segfaulting should be easy to fix when I can reproduce it. Are you
still using the 0.6.6 binary - because I think it still has the
threading segfault that was fixed in ldc 1.0. Using the pre-release
binary on github should be OK.
If you help me reproduce it, I can help you. Let's start with a
command line that is known to crash (now and then).
> sambamba is the only tool I know that
> correctly subsets a BAM file using a BED file. samtools doesn't
> use the indices so is very slow, and we work around this by
> sectioning in parts using the command line regions.
> In terms of new features, some useful thing would be:
> - First class CRAM support. I'm not sure where this is currently at in
> sambamba but I suspect we'll be moving more and more to CRAM soon.
> - Support downsampling BAMs/CRAMs to a maximum coverage (only
> downsample if above a certain coverage). We added this in VariantBam and it's
> really useful to reducing out of control runtimes on WGS runs in collapsed
> repeats but we're stuck trying to make it fast enough to be useful:
>
https://github.com/walaj/VariantBam/issues/13
Multithreaded programming in C++...
> Hope this helps with idea, and thanks again for all your great work on
> sambamba,
Sounds to me that we should work on the segfaults for sure. mpileup
and downsampling could be next. At least there is a clear demand :)
Pj.