sambamba error: "Unable to write to stream"

946 views
Skip to first unread message

Sarah Diehl

unread,
May 25, 2016, 3:29:13 AM5/25/16
to biovalidation
Dear all,

I keep having trouble with sambamba recently, both sort and merge in different pipelines. The actual error message is "Unable to write to stream", which usually means that the disk is full. However in my case there's plenty of space left on the disk. It might be a memory issue, but I'm not 100% certain. Usually I run bcbio-nextgen with -t local -n 12. If I reduce that to -n 2 it runs through fine, but this slows the whole pipeline down a lot. Is there a way to just limit sambamba?

Thank you very much in advance for your support!

Best wishes,
Sarah

Brad Chapman

unread,
May 25, 2016, 8:56:44 AM5/25/16
to Sarah Diehl, biovalidation

Sarah;
Sorry about the issues, I don't believe there have been any major changes to
the way we use sambamba that would cause global problems. Could you provide
the command line runs for some of the failures you're seeing? That might help
us diagnose the specific problem or suggest solutions. Generally, you can
sambamba uses the same memory usage as samtools, so you can control memory per
core with:

resources:
samtools:
memory: 2G

Hope this helps some,
Brad
> --
> You received this message because you are subscribed to the Google Groups "biovalidation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to biovalidatio...@googlegroups.com.
> To post to this group, send email to bioval...@googlegroups.com.
> Visit this group at https://groups.google.com/group/biovalidation.
> For more options, visit https://groups.google.com/d/optout.

Sarah Diehl

unread,
May 26, 2016, 9:14:23 AM5/26/16
to Brad Chapman, biovalidation
Dear Brad,

thank you very much for the quick response. I pasted two examples below.

In the first case the problem is that around 10 of those merges are running at the same time. If only 2 are running (by specifying -n 2) it's fine. Since there are no settings for cpu or memory in the command, I guess changes in bcbio_system.yaml will not have any influence on it?

In the second case there is only one sample and the memory limits are below what's available on the machine. I played around a little bit with the settings and also isolated the failing command, but each test run takes about half a day. Since all the sorting creates A LOT of temporary files maybe this "kills" the file system? So I thought about actually giving it more memory. However, even though I specified "cores: 4" and "memory: 4G" the pipeline only puts 1G in the command.

The error did not appear after any update or such. In fact the first case is quite an old installation of bcbio-nextgen, while the second is more recent. In the first case it used to work before and maybe what changed is the size of the input data or the kernel, FS clients, etc. on the cluster. The second case I only started last week, so this never worked so far.

Best wishes,
Sarah



---- FIRST CASE (bcbio-nextgen 0.8.9, sambamba v0.5.2)----

/mnt/gaiagpfs/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen/bin/sambamba merge /mnt/lustre/projects/epipgx/sdiehl/bcbng_run/disk14/disk14-call0/work/bamprep/PABQWNQ_2-1_1219052_2016-01-26/tx/tmpAq83ME/PABQWNQ_2-1_1219052_2016-01-26.PE.ra.md-reorder-fixrgs-gatkfilter-dedup-prep.bam -t 1 `cat /mnt/lustre/projects/epipgx/sdiehl/bcbng_run/disk14/disk14-call0/work/bamprep/PABQWNQ_2-1_1219052_2016-01-26/tx/tmp1d_Yaw/PABQWNQ_2-1_1219052_2016-01-26.PE.ra.md-reorder-fixrgs-gatkfilter-dedup-prep.list`



---- SECOND CASE (bcbio-nextgen 0.9.7, sambamba 0.5.9) ----

/work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/samtools sort -n -@ 4 -m 1G -O sam -T /mnt/lustre/projects/melanomics/sdiehl/work/tx/tmpAqpcpg/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-namesort /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup.bam | /work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/samblaster --addMateTags  --splitterFile >(/work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/samtools sort -@ 4 -m 1G -T /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmp_6Vv_O/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-dedup-sorttmp-spl -o /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmpzsw1sl/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-sr.bam /dev/stdin) --discordantFile >(/work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/samtools sort -@ 4 -m 1G -T /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmp_6Vv_O/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-dedup-sorttmp-disc -o /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmpRKfOcO/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-disc.bam /dev/stdin) | /work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/samtools view -b -S -u - | /work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/sambamba sort  -t 4 -m 1G --tmpdir /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmp_6Vv_O/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-dedup-sorttmp-full -o /mnt/lustre/projects/melanomics/sdiehl/work/bamclean/patient_2_PM/tx/tmp_6Vv_O/patient_2_PM.filtered-reorder-fixrgs-gatkfilter-dedup-dedup.bam /dev/stdin

Brad Chapman

unread,
Jun 1, 2016, 10:31:44 AM6/1/16
to Sarah Diehl, biovalidation

Sarah;
Glad this helps and the memory changes figures out the practical issues to get
it analyzed.

> For the sorting my test runs finished and indeed giving it more memory
> solves the problem. I'm also considering changing the setting for the tmp
> directory, because it turned out that our GPFS file system can handle it.
>
> Regarding the resource/memory settings, I still don't understand why only
> 1G per command. With 4G per core and 4 cores, so 16G total, 4 (sort)
> processes (that get memory limits) in that piped command and only one of
> these piped commands running, that would still leave 4G for each sort
> process.
> Unfortunately most of our cluster nodes have 5G per core ;-).

The memory specifications in the samtools commandline are per-core, so:

-@ 4 -m 1G

is 4Gb per program. So that should give you 16Gb for the 4 running samtools
programs in the pipe. Sorry, a lot of memory calculations.

If you'd like more fine grained memory usage for maximizing your 5g/core, you
could specify as megs -- 5000m -- bcbio is just not smart enough right now to
convert over there.

> I just ran into another small unrelated issue. The tool and database
> versions for snpEff don't match on my recent bcbio-nextgen installation. I
> pasted the error at the end. I ran the upgrade with "-u stable --tools
> --data --genome GRCh37", but the error remains. I manually downloaded the
> appropriate database from snpEff, but it contains much less files.

Sorry, that's strange -- it should identify the snpEff database is out of date
and re-download it. If you remove the directory and re-run bcbio should
download it to save you the manual steps.

Hope this helps,
Brad

>
> ------------- snpEff error ---------------
>
> CalledProcessError: Command 'set -o pipefail;
> /work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/snpEff
> -Xms750m -Xmx4g
> -Djava.io.tmpdir=/tmp/bcbiotx/7db65472-2585-49f4-b817-ed2997390178/tmpWOLr2D/tmp
> eff -dataDir
> /mnt/gaiagpfs/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen/genomes/Hsapiens/GRCh37/snpeff
> -cancer -noHgvs -noLog -i vcf -o vcf -s
> /mnt/lustre/projects/melanomics/sdiehl/work/structural/patient_2_PM/manta/tumorSV-patient_2_PM-effects-stats.html
> GRCh37.75
> /mnt/lustre/projects/melanomics/sdiehl/work/structural/patient_2_PM/manta/tumorSV-patient_2_PM.vcf.gz
> |
> /work/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen-v0.9.4/anaconda/bin/pbgzip
> -n 3 -c >
> /tmp/bcbiotx/7db65472-2585-49f4-b817-ed2997390178/tmpWOLr2D/tumorSV-patient_2_PM-effects.vcf.gz
> java.lang.RuntimeException: Database file
> '/mnt/gaiagpfs/projects/lcsbsoft/workflows/pipelines/NGS/bcbio-nextgen/genomes/Hsapiens/GRCh37/snpeff/GRCh37.75/snpEffectPredictor.bin'
> is not compatible with this program version:
> Database version : '4.1'
> Program version : '4.2'
> Try installing the appropriate database.
>
>
> ---------------------------------------
>
> 2016-05-27 20:49 GMT+02:00 Brad Chapman <chap...@50mail.com>:
>
>>
>> Sarah;
>> Thanks for all the details, this helps a lot for trying to sort out issues.
>>
>> > In the first case the problem is that around 10 of those merges are
>> running
>> > at the same time. If only 2 are running (by specifying -n 2) it's fine.
>> > Since there are no settings for cpu or memory in the command, I guess
>> > changes in bcbio_system.yaml will not have any influence on it?
>>
>> Sorry, I don't have a good way to fix this without upgrading versions of
>> bcbio. We've moved away from sambamba merge because of scaling issues like
>> you're see, so no longer take this approach. I think your diagnosis is
>> correct -- it's probably due to running too many at once but we don't have
>> a
>> good workaround for it by tweaking bcbio_system.yaml.
>>
>> > In the second case there is only one sample and the memory limits are
>> below
>> > what's available on the machine. I played around a little bit with the
>> > settings and also isolated the failing command, but each test run takes
>> > about half a day. Since all the sorting creates A LOT of temporary files
>> > maybe this "kills" the file system? So I thought about actually giving it
>> > more memory. However, even though I specified "cores: 4" and "memory: 4G"
>> > the pipeline only puts 1G in the command.
>>
>> Your diagnosis seems spot on again. I'm a little surprised that Lustre is
>> falling over like this and not letting you write to disk, but depending on
>> the
>> setup and other ongoing work on the cluster your thoughts sound
>> reasonable. If
>> you only see this under load and you're not nearing the memory
>> requirements of
>> the machine, this could be the cause.
>>
>> The other option is that you're having memory issues. What kind of memory
>> and
>> cores do the machines you're running on have? The reason it uses 1G
>> memory/core instead of the 4G per core you specified is that there are
>> multiple process running simultaneously in that piped command so it's
>> dividing
>> up the available memory. If you set to 6G per core it should use 2G for
>> each
>> step and you can see if that improves the runs or make it worse.
>>
>> Sorry for not having definite answers -- tuning at scale depends a lot on
>> the
>> system -- but hope this discussion helps,
>> Brad
Reply all
Reply to author
Forward
0 new messages