abyss performance

221 views
Skip to first unread message

jca...@gmail.com

unread,
Feb 27, 2018, 3:46:17 PM2/27/18
to ABySS
Dear colleagues,

I am trying to assembly a big genome sequenced at low coverage using short-read strategy . I ran abyss-pe without bloom filter setting in multiple threads and a maximum of ~120 GB of RAM.

The assembly is in "Generating adjacency" step since two weeks ago. I want to ask you if , this time is normal for a default configuration of abyss with this max. memory configuration and if you know if it is going to end in a short period.

Apart of trying low memory assembly strategies, in the case of abyss assembler, running the assembly with more threads and memory will reduce the delay time? I was monitoring the thread usage of abyss and it seems that most of the time is using 2 or 3 threads. Is it normal?

Any advice or suggestion will be greatly appreciated. I will be very grateful for your help.

Best regards,
Julia

Shaun Jackman

unread,
Mar 26, 2018, 7:05:29 PM3/26/18
to ABySS
Hi, Julia. Assembling a human genome (3 Gbp) with ABySS without using the Bloom filter takes about 1 terabyte of RAM. With the Bloom filter it takes about 35 GB of RAM. There's a good chance that your genome will take more than 120 GB of RAM. I'd recommend using the Bloom filter mode.

Cheers,
Shaun

Ben Vandervalk

unread,
Mar 27, 2018, 11:11:01 AM3/27/18
to jca...@gmail.com, abyss...@googlegroups.com
Hi Julia,

I agree with Shaun that 120GB is not enough RAM for a large genome, unless you use the Bloom filter assembly mode.

It is difficult to tell without seeing your abyss-pe command and verbose log output, but you may be hitting this issue: https://github.com/bcgsc/abyss/wiki/ABySS-Users-FAQ#2-my-abyss-assembly-jobs-hang-when-i-run-them-with-high-k-values-eg-k250

Also, you should try to determine how much memory your ABySS job is using, by using `top` or `ps`.  That way you could confirm if inadequate memory is really the problem.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Julia Carballo

unread,
Mar 28, 2018, 7:55:25 AM3/28/18
to Ben Vandervalk, abyss...@googlegroups.com
Hi Ben,

Thank you for your reply. I have now more memory available so I hope that with this new memory resources I will be able to obtain an assembly of my data.

Meanwhile, I have been doing tests with the bloom filter mode. I had problems with memory again. I have normalised my input file to a 50X coverage and I tried to assemble this dataset. I also reduced the kmer length from 70bp to 55bp, just in case 70bp was too high.The size of the Ac-Mix-1.fa file generated was 55GB.

Command exited with non-zero status 2
        Command being timed: "abyss-pe k=55 np=45 j=45 name=NNN in=1_50X_paired.fastq.gz 2_50X_paired.fastq.gz B=500M H=1 kc=3 v=-vv"
        User time (seconds): 4345687.72
        System time (seconds): 2722.35
        Percent of CPU this job got: 4289%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 28:09:34
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 130186120
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 41695706
        Minor (reclaiming a frame) page faults: 125143316
        Voluntary context switches: 15904049
        Involuntary context switches: 149582831
        Swaps: 0
        File system inputs: 122042208
        File system outputs: 114486288
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 2
------
Extending read: ST-xxx:xxx:21860/2
Extending read: ST-xxx:xxx::33269/2
Extending read: ST-xxx:xxx:51219/2
Extended 74805569 of 208879590 reads (35.8%), assembled 27688563301 bp so far
Assembly complete
AdjList -vv   -k55 -m50 --dot Ac-Mix-1.fa >Ac-Mix-1.dot
Reading `Ac-Mix-1.fa'...
/bin/bash: line 1: 45329 Killed                  AdjList -vv -k55 -m50 --dot Ac-Mix-1.fa > Ac-Mix-1.dot
/usr/bin/abyss-pe:512: recipe for target 'Ac-Mix-1.dot' failed
make: *** [Ac-Mix-1.dot] Error 137
make: *** Deleting file 'Ac-Mix-1.dot'
------     
 
Any advice wil be greatly appreciated.

Thank you very much again for your attention.

Julia



To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users+unsubscribe@googlegroups.com.

Ben Vandervalk

unread,
Mar 28, 2018, 12:18:34 PM3/28/18
to jca...@gmail.com, abyss...@googlegroups.com
Argh, this is unfortunate.  Yes, we have had several reports of the Bloom filter assemblies using large memory at the `AdjList` stage.

Possible workarounds to reduce memory usage by `AdjList` (in order of preference):

* increase `k`
* increase `kc` (minimum k-mer occurrence cutoff) 
* add `m=k-1`  to your `abyss-pe` command, to identify `AdjList` to only find k-1 overlaps. Normally `AdjList` finds overlaps between 50 bp and k-1 bp, but finding overlaps less than k-1 bp requires building a suffix array which in turn requires more memory (`AdjList` builds a compacted de Bruijn graph by finding end-to-end overlaps between pairs of sequences.)

- Ben

To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.

Julia Carballo

unread,
Mar 28, 2018, 2:07:15 PM3/28/18
to Ben Vandervalk, abyss...@googlegroups.com
Oh, okey! Thank you very much for the suggestions. I will test these new settings and I will let you know if I am able to obtain better results.

Have a nice Easter!

To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages