Big data assembly on compute cluster in a faster way

12 views

Skip to first unread message

Priti Prasad

unread,

Aug 9, 2018, 10:36:10 AM8/9/18

to trinityrnaseq-users

Dear Trinity Developer,

I am working on big data analysis (200 to 400 Gb data) where I clustered similar kind of RNAseq data to assemble on the computer cluster (de novo and genome-guided independently). I am working on trinity v 2.2.0 by running each step separately with butterfly on parallel mode, that takes a couple of days to complete the process.

Due to some policy issues, I am unable to work on the cluster for more than 2 days. So my concern is about how to fasten the assembly process or to set a pause in the mid of the process so that I can resubmit the process with the new id from where it gets stopped.

I hope I make clear my problem. Kindly guide me best possible method achieve my target.

Brian Haas

unread,

Aug 9, 2018, 10:45:12 AM8/9/18

to prasad....@gmail.com, trinityrnaseq-users

Here's one way to do it:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity#staged_exec

which should work as long as the phase-1 computes can complete within 2 hours each.

For the last stage, you can just restart it and it'll continue where it left off each time. Eventually, it should finish, if not within the 2 days.

best,

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.