How to solve for Jellyfish and Butterfly failing to allocate memory in Trinity on an HPC cluster

2,370 views
Skip to first unread message

mira...@umn.edu

unread,
Mar 22, 2015, 12:21:30 PM3/22/15
to trinityrn...@googlegroups.com
Hello Trinity-ers,

I had an issue were if many reads were being assembled on an HPC cluster, I'd run into this wall and found a way around it with major assistance of Tiago Hori from UNL and Mengxing Cheng from FIU. This issue pops up when you have a "largish" number of reads. It didn't come up with 20 million PE reads bit it did after 60 M PE reads.

>>>>>> terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<j$
  what():  Failed to allocate 67527304624 bytes of memory
Error, cmd: /home/applications/trinity/2.0.3/trinity-plugins/jellyfish/bin/jellyfish count -t 32 -m 25 -s 17531127145  both.fa died with ret 134 at /home/applications/trinity/2.0.3/Trinity line 2110.

Trinity run failed. Must investigate error above. <<<<<< 

The issue is Jellyfish takes its RAM memory allocation from Butterfly settings, and the memory allocation needs to match what's requested per core (called CPU in the Thrinity pipeline, or thread sometimes elsewhere) and the total memory to be used. This is actually stated in the Trinity "extensive" options showed if you type the "--show_full_usage_info" flag, but if you're like me and didn't read it carefully you probably missed it. In my defense it is worded a bit confusing: "--bflyCalculateCPU              :Calculate CPUs based on 80% of max_memory divided by maxbflyHeapSpaceMax"

You basically need to multiple the memory in "--bflyHeapSpaceInit" TIMES the number of CPUs "--bflyCPU" <--> "--CPU" to be EQUAL = "--max_memory"

For example, if you have say 128GB in a node with 16 cores at your HPC cluster to play with: --bflyHeapSpaceInit 4G * --CPU 16 = --max_memory 64G
In this example, even though you can use up to 128GB, what you request Jellyfish and Butterfly is to have 64GB because 4*16= 64. 
NOTE: this is because memory is per core, not total on the butterfly options. It's also important the "--bflyHeapSpaceMax" is set higher, in this case say 5G that would still be under the maximum memory on your node (128GB). If you set "--bflyHeapSpaceMax" to also be 4 then Trinity will crash, because initial and max heap space cannot be the same. So don't be greedy! In this case you're not going to run out of memory because your node has 128GB of RAM. The max amount of memory per CPU with this set up would be 7G so maxheap is 8G because 8GB*16CPUs = 128GB. If you want to use more CPUs you need to reduce the amount of RAM for each (32 CPU could only use 4GB maxheap in this example).

Now, THIS IS VITAL! Make sure you can access enough memory at your HPC cluster. Following the "1GB per million reads" rule in Trinity will give you a good ballpark to know how much you'd need. That becomes problematic if your cluster has a limit and you need 400GB of RAM but your limit per user or account is 128GB. If you tell Jellyfish and Butterfly to help themselves to 400GB when that's not available then it'll crash. Make sure that if you're submitting your jobs through a script that you're accessing the appropriate resources otherwise your analysis will die. If you have large limits you can be greedier but you must follow the formula explained above.

So the word is, figure out how much RAM memory you can access on your cluster, estimate how much you really need, do the math (initheap * #CPUs = max mem), and you'll likely avoid issues. BTW, I use a Mac to test out runs and then send them to the cluster. I will be posting how to compile Trinity and full Trinotate on your Mac after some tweaking. It works even with the latest version of java (>=1.8). This issue won't pop up on a personal computer since the RAM amount is little (32 at most on Macs).

Happy Trinity-ing!

-- Hernán, Crustacean-bound Postdoc at FIU 

Brian Haas

unread,
Mar 22, 2015, 1:11:23 PM3/22/15
to mira...@umn.edu, trinityrn...@googlegroups.com
Hi Hernan,

Thanks for the recommendations, but I think there are a couple of different things at play here.

First, the jellyfish RAM allocation should be entirely based on your --max_memory setting.  You can set this lower (say 10G), and it'll just generate additional files and take longer to run during this initial phase.  This setting is not absolutely deterministic... it's a ballpark recommendation to jellyfish, so if you set --max_memory 10G, it should be ~10G of max memory usage.

Now, there are a lot of other places where Trinity might crash due to hardware related issues, and we have different recommendations depending on what the error is. In general, the default settings in Trinity work very well in almost all scenarios, and hardware-related issues have crept up in certain edge cases that we've dealt with over time.

My key recommendation at the moment is to use v2.0.6 (latest release), set --max_memory to something less than the amount of physical memory on your machine (I typically just set it to 50G for any medium to large data set), and set the --CPU parameter to the number of concurrent threads to use where possible (some number less than the number of cores on your server).  If you have access to LSF, SGE, SLURM, or PBS, then definitely explore using --grid_conf for most efficient use of resources and shortest runtimes.  That's pretty much it.

best,

~brian

Hernan Vazquez Miranda

unread,
Mar 23, 2015, 1:24:49 AM3/23/15
to Brian Haas, trinityrn...@googlegroups.com
Hello Brian,

I am just trying to post what I've found as an end user in the hope others with similar issues don't run into the same wall. Thank you for a more detailed explanation and advanced insight on how Trinity works in a cluster. My guess is that there are several novice users like myself that are looking for answers and could find what was found useful.

I will post how I was able to work around the clang issues in mac to have awfully functional Trinity-Trinotate combo.

Hernán
--
----
           __       __
          / <`     '> \
         (  / @   @ \  )
          \(_ _\_/_ _)/
        (\ `-/     \-' /)
         "===\     /==="
          .==')___(`==.    hjw
         ' .='     `=. 
Hernán Vázquez Miranda, PhD
Postdoctoral Research Associate
Florida International University
Bracken-Grissom Lab
http://www.brackengrissomlab.com

previous
           /\ .-"""-. /\
           \ `"`'v'`"` /
           { .=.   .=. }
           {( O ) ( O )}
           .\'=' V '='/.
          /.'`-'-'-'-`'.\
         |/   ^  ~  ^   \|
         \|    ^   ^    |/
          \\           //
           \\ _     _ //
       -----'(|)---(|)'-----
       -jgs--,-------,----- 
            / .' : '. \
           '-'._.-._.'-'
University of Minnesota
Dept. of Ecology, Evolution, and Behavior
1987 Upper Buford Cir
100 Ecology Building
St. Paul. MN 55108 USA
hernan[at]
umn.edu


and

Bell Museum of Natural History
University of Minnesota

http://www.bellmuseum.umn.edu/

Brian Haas

unread,
Mar 23, 2015, 7:48:25 AM3/23/15
to her...@umn.edu, trinityrn...@googlegroups.com
Sounds great.  Much appreciated!  :)

-Brian
(by iPhone)

Francois Seneca

unread,
Sep 18, 2015, 12:33:28 PM9/18/15
to trinityrnaseq-users, mira...@umn.edu
Hi Hernan and Brian,

I am using a personal work station at home to assemble my transcriptomes.  I have 64G of RAM and 16 CPUs on my machine. I have launched Trinity and it seems to be running, but it's taking a long time. Right now the Jellyfish k-mer cataloging has been running for 18 hours on a test fastq file of 16M reads. I expected the job to be done by now based on the 1G of RAM per 1M read ~ in one hour estimation. Before I read the info on this thread, I set Trinity command line with --max_memory 20G and --CPU 16 to process a fastq file made of 16M reads.

Before I run a real assembly (using ~60M reads), I would like to make sure I am using Trinity settings in accord with my machine specs:

1) based on Brian's recommendation above:

Trinity --seqType fq --SS_lib_type FR --normalize_by_read_set --left reads_1.fq --right reads_2.fq --CPU 15 --max_memory 50G --monitoring --verbose
 
or

2) based on Hernan's experience - I calculated the following 64G RAM / 16 CPUs = 4G max mem --> so 3G for bflyHeapSpaceInit

Trinity --seqType fq --SS_lib_type FR --normalize_by_read_set --left reads_1.fq --right reads_2.fq --bflyHeapSpaceInit 3 --CPU 16 --monitoring --verbose
 

Your help would be greatly appreciated.

Cheers,

Francois
Reply all
Reply to author
Forward
0 new messages