Should I give up assembling with Trinity? Please help!

184 views
Skip to first unread message

Adriana Romero

unread,
May 15, 2017, 5:19:05 PM5/15/17
to trinityrnaseq-users
Hi! 

I've been trying for months now to make a huge meta-transcriptome assembly with Trinity. I have 233 million paired end reads. 
 
According to Trinity documentation (1 gb max memory per million pair reads) I have enough memory in the cluster I'm using. The size of the cluster is pretty big: Five (5) 64-core nodes each with 512GB of memory

In my script, I've specified 24, 32, and 60 CPUs and max_memory of 480, 350, 250, and 235 and every single possible combination between CPUs and max_memory.  

At this point I've tried everything, to reduce memory requirements such as --normalize_by_read_set --no_bowtie and every single time I get the same error : "bad_alloc" in the inchworm process. 

I reduced the size of my dataset to 160 million paired end reads and I get the same error (error below). 

What can I do? Is there anything that I can do to actually get this to work? I'm using the newest version of Trinity. 


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
done parsing 13216557902 Kmers, 13216557902 added, taking 43378 seconds.

TIMING KMER_DB_BUILDING 43378 s.
Pruning kmers (min_kmer_count=1 min_any_entropy=0 min_ratio_non_error=0.05)
Pruned 5525040 kmers from catalog.
Pruning time: 78298 seconds = 1304.97 minutes.

TIMING PRUNING 78298 s.
-populating the kmer seed candidate list.
Kcounter hash size: 13216557902
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Error, cmd: /data/apps/trinity/r2017-2.4.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.fa --run_inchworm -K 25 -L 25 --monitor 1   --num_threads 6  --PARALLEL_IWORM  > /dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/inchworm.K25.L25.fa.tmp 2>tmp.63069.stderr died with ret 34304 at /data/apps/trinity/r2017-2.4.0/PerlLib/Pipeliner.pm line 166.
Pipeliner::run('Pipeliner=HASH(0x1db3530)') called at /data/apps/trinity/r2017-2.4.0/Trinity line 2289
eval {...} called at /data/apps/trinity/r2017-2.4.0/Trinity line 2284
main::run_inchworm('/dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/inchworm.K25.L...', '/dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/both.fa', 'RF', '') called at /data/apps/trinity/r2017-2.4.0/Trinity line 1537
main::run_Trinity() called at /data/apps/trinity/r2017-2.4.0/Trinity line 1263
eval {...} called at /data/apps/trinity/r2017-2.4.0/Trinity line 1262



 

 

Ken Field

unread,
May 15, 2017, 7:34:13 PM5/15/17
to Adriana Romero, trinityrnaseq-users
Adriana-
I think it is unlikely that the nodes share memory so any one job only has access to 512 Gb of memory. I successfully assembled a similarly sized dataset with the following command (with Trinity 2.3.2) and only 150 Gb allocated:


/home/accounts/facultystaff/k/kfield/software/trinityrnaseq-Trinity-v2.3.2/Trinity \
 --left left.fastq\
 --right right.fastq\
 --max_memory 8G \
 --CPU 12 --seqType fq --full_cleanup\
 --normalize_reads --SS_lib_type RF --output trinity_out_dir/ \
  > /home/accounts/facultystaff/k/kfield/trinity232.norm.${PBS_JOBID}.log

Note that the max_memory flag is per thread and leaves some overhead (8x12 = 96 Gb). Also note that this was with a normalized run and you may not want to do that with a meta-transcriptome assembly.

Best of luck,
Ken


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Ken Field, Ph.D.
Professor of Biology
Program in Cell Biology/Biochemistry
Bucknell University
Room 211 Biology Building

Brian Haas

unread,
May 16, 2017, 9:02:49 AM5/16/17
to Ken Field, Adriana Romero, trinityrnaseq-users
also, if inchworm is throwing bad_alloc() (out of memory), then the
only way to get around this (if you can't get on a bigger machine) is
to set the --min_kmer_cov to a higher value than the default (1).
Setting it to 2 should lead to a massive decrease in memory usage,
although it will lead to more fragmented transcripts for those that
are lowly expressed.

~b

Philip Blood

unread,
May 18, 2017, 3:04:59 PM5/18/17
to trinityrnaseq-users, kfi...@bucknell.edu, adrilu...@gmail.com
Hi Adriana,

We've had success running metatranscriptome assemblies with Trinity on our large memory nodes on the Bridges system at the Pittsburgh Supercomputing Center. We have 42 3TB nodes and 4 12 TB nodes on Bridges. If you are at a US research institution, or have a collaborator at a US institution, you can get a free allocation of compute time on Bridges through XSEDE. If you're interested, there is information about getting a Startup allocation here:


The Startup requires just a paragraph about what you'd like to do, along with the CV of the person who will be PI on the allocation (any staff member at a US institution can be PI, but not students).  If you'd like to discuss further, just let me know and I can email you off list.

Best,
Philip Blood
Pittsburgh Supercomputing Center
Carnegie Mellon University
Reply all
Reply to author
Forward
0 new messages