Disk quota exceeded during phase 2 assembly of clustered reads

67 views
Skip to first unread message

Rob W.

unread,
Nov 22, 2023, 3:57:47 PM11/22/23
to trinityrnaseq-users
Hi Brian,

My Step 4 run errored out after getting 66% of the way through, where I hit my scratch directory's inode limit of 1,000,000. This is the second time this has happened when trying an assembly with my dataset, which is 63 libraries with a depth of ~45 million reads/library. During "Step 4" (I've been running Trinity stepwise), it seems that there were over 990,000 files generated. I'm well aware Trinity generates a ton of files, but is this unusual? 
My script:
$TRINITY_HOME/Trinity --seqType fq \
--max_memory 500G \
--samples_file /storage/home/rpw5414/scratch/25Oct-iteration2/scripts/samples63.txt \
--CPU 20 \
--quality_trimming_params "ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:6:5 LEADING:20 TRAILING:10 MINLEN:30" \
--monitoring \
--min_contig_length 150 \
--min_kmer_cov 2 \
--no_parallel_norm_stats \
--no_salmon \
> step4.log 2>&1 &

Is this because the outputs of chrysalis and prior steps cannot be compressed/deleted until the assembly completes? Is there any way I can delete files that no longer are needed? I'd hate to have to start over from scratch, since it took a lot of effort and troubleshooting to even get to "Step 4". 

Any ideas on how to proceed?

Thanks as always, 
Robert

Brian Haas

unread,
Nov 22, 2023, 4:07:50 PM11/22/23
to Rob W., trinityrnaseq-users
Hi Robert,

This setting:  --min_contig_length 150  is definitely an exacerbating factor here.  The default min contig length is 200, and I wouldn't be surprised if setting it to 150 drastically increases the file count.  I'd expect the number of files would go up exponentially with shorter min contig lengths.  

If you need to go after shorter contigs, maybe try assembling with the default parameters, then take the reads that don't align under liberal alignment settings, and then try to assemble the unmapped reads separately...?

Increasing the --min_kmer_cov value to 3 could reduce numbers of files too, but probably to a far lesser effect than the contig length threshold.

Hope this helps,

Brian





--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/c7f52b65-8caa-4405-bbe0-75dd01e6286bn%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Rob W.

unread,
Nov 24, 2023, 12:18:58 PM11/24/23
to trinityrnaseq-users
Hi Brian,

That makes sense. I suppose I have the contig length as 150 since we did 150 bp sequencing; would setting it to 200 materially affect the assembly? We aren't looking for minute patterns or splice variants, so we don't need the absolute highest sensitivity for the assembly. 

And if I wanted to change it 200, would I have to restart the assembly from square one? I know I was hesitant to restart above, but I'd rather have a good finished assembly than cut corners. However, if there's a way to clear out the bloated files and restart "step 4" with contig_length 200, that would be nice. 

--Robert

Brian Haas

unread,
Nov 24, 2023, 12:31:41 PM11/24/23
to Rob W., trinityrnaseq-users
I think you can get away with removing these files from the trinity_out_dir, and then rerunning your original command with the modified min contig length setting of 200.

Remove the following directory recursively:  

      rm -rf read_partitions/

then remove these individual files:

   partitioned_reads.files.list
  partitioned_reads.files.list.ok
  
  recursive_trinity.cmds
   recursive_trinity.cmds.ok
  recursive_trinity.cmds.completed

and in the chrysalis/ directory:
   GraphFromIwormFasta.out
   GraphFromIwormFasta.out.ok
   bundled_iworm_contigs.fasta
   bundled_iworm_contigs.fasta.ok
   readsToComponents.out
   readsToComponents.out.rcts.out
   readsToComponents.out.ok
   readsToComponents.out.sort
   readsToComponents.out.sort.ok


Once you restart Trinity, it should pick up at the chrysalis graphToFasta step and continue on from there, generating far fewer numbers of files.


Hope this works!

Brian

Rob W.

unread,
Nov 30, 2023, 2:23:07 PM11/30/23
to trinityrnaseq-users
Brian,

That worked and the assembly completed without issue. Thank you for your assistance! Only generated 800,000 files instead of >1,000,000.

Brian Haas

unread,
Nov 30, 2023, 2:24:29 PM11/30/23
to Rob W., trinityrnaseq-users
Fantastic!  I always love hearing about when things work! :-)

all the best,

Brian

Reply all
Reply to author
Forward
0 new messages