Sealer: closed gaps = zero (illumina 150bp reads)

109 views
Skip to first unread message

Christian Rinke

unread,
Oct 22, 2015, 12:55:58 PM10/22/15
to ABySS
Hello,
I recently encountered a problem with Sealer and after asking several of my colleagues, they confirmed the same problem: Sealer doesn't seem to close a single gap when it's fed with 150bp illumina reads. So I was wondering if this is a bug in the software, or a problem of the run settings?

I called sealer like this:
abyss-sealer -S  assembly.fasta  -j 16 -F 700 -B 5000 -P 20 -k 90 -k 80 -k 70 -k 60 -k 50 -k 40 -o result_sealed  reads_FW_1.fastq.gz reads_FW_2.fastq.gz

The assembly file I used is rather large ~3Gb and I used  trimmed illumina 150base FW reads.
The run finished without any errors, but no gaps were closed - see log file below.

Thank you,
Chris


p.s.: log file:

result_sealed_log.txt

Finding flanks
974016 gaps found

974015 flanks extracted


Building bloom filter

Starting K run with k = 90

Flanks inserted into k run = 974015

0 unique gaps closed for k90

No start/goal kmer: 165

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973850

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k90 run complete

Total gaps closed so far = 0


Building bloom filter

Starting K run with k = 80

Flanks inserted into k run = 974015

0 unique gaps closed for k80

No start/goal kmer: 153

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973862

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k80 run complete

Total gaps closed so far = 0


Building bloom filter

Starting K run with k = 70

Flanks inserted into k run = 974015

0 unique gaps closed for k70

No start/goal kmer: 135

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973880

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k70 run complete

Total gaps closed so far = 0


Building bloom filter

Starting K run with k = 60

Flanks inserted into k run = 974015

0 unique gaps closed for k60

No start/goal kmer: 124

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973891

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k60 run complete

Total gaps closed so far = 0


Building bloom filter

Starting K run with k = 50

Flanks inserted into k run = 974015

0 unique gaps closed for k50

No start/goal kmer: 121

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973894

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k50 run complete

Total gaps closed so far = 0


Building bloom filter

Starting K run with k = 40

Flanks inserted into k run = 974015

0 unique gaps closed for k40

No start/goal kmer: 111

No path: 0

Unique path: 0

Multiple paths: 0

Too many paths: 0

Too many branches: 973904

Too many path/path mismatches: 0

Too many path/read mismatches: 0

Contains cycle: 0

Exceeded mem limit: 0

Skipped: 0

974015 flanks left

k40 run complete

Total gaps closed so far = 0


K sweep complete

Creating new scaffold with gaps closed...

New scaffold complete

Gaps closed = 0

0%

Ben Vandervalk

unread,
Oct 22, 2015, 1:17:40 PM10/22/15
to Christian Rinke, ABySS
Hi Chris,

Thank you for providing the log, it is very helpful.  I see that all attempts to close gaps are failing with the "Too many branches" outcome.  That means that the graph search algorithm which tries to connect the two flanks is hitting its internal limit for the breadth of the search frontier -- in other words, the graph search space gets big and hairy and it gives up.

The first thing to check is that the Bloom filter that is built by Sealer has a reasonable false positive rate (FPR).  The Bloom filter FPR should be reported somewhere in the log file, and you should make sure that the FPR is < 20%.   The lower the better.    If the FPR is too high, you can fix that by increasing the amount of memory allocated for the Bloom filter with the `-b` option (e.g. "-b 20G").   

If the Bloom filter FPR is not the problem,  you may be able to improve your results by adjusting parameter settings.   Things that might help are:

* increasing -k (increasing k results in a less tangled de Bruijn graph).   This will decrease run time.
* decreasing -F.  This parameter represents 2*flank_length + max_gap_size, where flank_length is 100bp by default. Decreasing -F decreases the depth limit for the search algorithm.  This will decrease run time.
* increasing -B.  Increases the breadth limit for the search (i.e. the "Too many branches" limit.)   This will increase run time.

Good luck!

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christian Rinke

unread,
Oct 23, 2015, 12:41:59 PM10/23/15
to ABySS, christi...@gmail.com
Hi Ben,
Thank you for your immediate reply!
I didn't find the Bloom Filter FPR anywhere in the logfile, However, as a first step, I increased the memory to 20G as you suggested - to see if this will improve the results.
Cheers,
Chris

Christian Rinke

unread,
Oct 27, 2015, 1:41:40 PM10/27/15
to ABySS, christi...@gmail.com
Hi Ben,
It worked with -b 40G .
Thanks,
Chris


Ben Vandervalk

unread,
Oct 27, 2015, 1:42:08 PM10/27/15
to Christian Rinke, ABySS
Happy to hear!

- Ben

Reply all
Reply to author
Forward
0 new messages