Should I give up assembling with Trinity? Please help!

Adriana Romero

unread,

May 15, 2017, 5:19:05 PM5/15/17

to trinityrnaseq-users

Hi!

I've been trying for months now to make a huge meta-transcriptome assembly with Trinity. I have 233 million paired end reads.

According to Trinity documentation (1 gb max memory per million pair reads) I have enough memory in the cluster I'm using. The size of the cluster is pretty big: Five (5) 64-core nodes each with 512GB of memory

In my script, I've specified 24, 32, and 60 CPUs and max_memory of 480, 350, 250, and 235 and every single possible combination between CPUs and max_memory.

At this point I've tried everything, to reduce memory requirements such as --normalize_by_read_set --no_bowtie and every single time I get the same error : "bad_alloc" in the inchworm process.

I reduced the size of my dataset to 160 million paired end reads and I get the same error (error below).

What can I do? Is there anything that I can do to actually get this to work? I'm using the newest version of Trinity.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

done parsing 13216557902 Kmers, 13216557902 added, taking 43378 seconds.

TIMING KMER_DB_BUILDING 43378 s.

Pruning kmers (min_kmer_count=1 min_any_entropy=0 min_ratio_non_error=0.05)

Pruned 5525040 kmers from catalog.

Pruning time: 78298 seconds = 1304.97 minutes.

TIMING PRUNING 78298 s.

-populating the kmer seed candidate list.

Kcounter hash size: 13216557902

terminate called after throwing an instance of 'std::bad_alloc'

what(): std::bad_alloc

Error, cmd: /data/apps/trinity/r2017-2.4.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.fa --run_inchworm -K 25 -L 25 --monitor 1 --num_threads 6 --PARALLEL_IWORM > /dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/inchworm.K25.L25.fa.tmp 2>tmp.63069.stderr died with ret 34304 at /data/apps/trinity/r2017-2.4.0/PerlLib/Pipeliner.pm line 166.

Pipeliner::run('Pipeliner=HASH(0x1db3530)') called at /data/apps/trinity/r2017-2.4.0/Trinity line 2289

eval {...} called at /data/apps/trinity/r2017-2.4.0/Trinity line 2284

main::run_inchworm('/dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/inchworm.K25.L...', '/dfs1/bio/alromer1/Trinity_ALL_8/Trinity_ALL_f/both.fa', 'RF', '') called at /data/apps/trinity/r2017-2.4.0/Trinity line 1537

main::run_Trinity() called at /data/apps/trinity/r2017-2.4.0/Trinity line 1263

eval {...} called at /data/apps/trinity/r2017-2.4.0/Trinity line 1262

Ken Field

unread,

May 15, 2017, 7:34:13 PM5/15/17

to Adriana Romero, trinityrnaseq-users

Adriana-

I think it is unlikely that the nodes share memory so any one job only has access to 512 Gb of memory. I successfully assembled a similarly sized dataset with the following command (with Trinity 2.3.2) and only 150 Gb allocated:

/home/accounts/facultystaff/k/kfield/software/trinityrnaseq-Trinity-v2.3.2/Trinity \

--left left.fastq\

--right right.fastq\

--max_memory 8G \

--CPU 12 --seqType fq --full_cleanup\

--normalize_reads --SS_lib_type RF --output trinity_out_dir/ \

> /home/accounts/facultystaff/k/kfield/trinity232.norm.${PBS_JOBID}.log

Note that the max_memory flag is per thread and leaves some overhead (8x12 = 96 Gb). Also note that this was with a normalized run and you may not want to do that with a meta-transcriptome assembly.

Best of luck,

Ken

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

Ken Field, Ph.D.

Professor of Biology

Program in Cell Biology/Biochemistry

Bucknell University

Room 211 Biology Building

Brian Haas

unread,

May 16, 2017, 9:02:49 AM5/16/17

to Ken Field, Adriana Romero, trinityrnaseq-users

also, if inchworm is throwing bad_alloc() (out of memory), then the
only way to get around this (if you can't get on a bigger machine) is
to set the --min_kmer_cov to a higher value than the default (1).
Setting it to 2 should lead to a massive decrease in memory usage,
although it will lead to more fragmented transcripts for those that
are lowly expressed.

~b

Philip Blood

unread,

May 18, 2017, 3:04:59 PM5/18/17

to trinityrnaseq-users, kfi...@bucknell.edu, adrilu...@gmail.com

Hi Adriana,

We've had success running metatranscriptome assemblies with Trinity on our large memory nodes on the Bridges system at the Pittsburgh Supercomputing Center. We have 42 3TB nodes and 4 12 TB nodes on Bridges. If you are at a US research institution, or have a collaborator at a US institution, you can get a free allocation of compute time on Bridges through XSEDE. If you're interested, there is information about getting a Startup allocation here:

https://portal.xsede.org/allocations/startup

The Startup requires just a paragraph about what you'd like to do, along with the CV of the person who will be PI on the allocation (any staff member at a US institution can be PI, but not students). If you'd like to discuss further, just let me know and I can email you off list.

Best,

Philip Blood

Pittsburgh Supercomputing Center

Carnegie Mellon University

Reply all

Reply to author

Forward