Big data assembly on compute cluster in a faster way

12 views
Skip to first unread message

Priti Prasad

unread,
Aug 9, 2018, 10:36:10 AM8/9/18
to trinityrnaseq-users
Dear Trinity Developer,

I am working on big data analysis (200 to 400 Gb data)  where I clustered similar kind of RNAseq data to assemble on the computer cluster (de novo and genome-guided independently). I am working on trinity v 2.2.0 by running each step separately with butterfly on parallel mode, that takes a couple of days to complete the process.

Due to some policy issues, I am unable to work on the cluster for more than 2 days. So my concern is about how to fasten the assembly process or to set a pause in the mid of the process so that I can resubmit the process with the new id from where it gets stopped. 

I hope I make clear my problem. Kindly guide me best possible method achieve my target.

Brian Haas

unread,
Aug 9, 2018, 10:45:12 AM8/9/18
to prasad....@gmail.com, trinityrnaseq-users
Here's one way to do it:


which should work as long as the phase-1 computes can complete within 2 hours each.

For the last stage, you can just restart it and it'll continue where it left off each time.  Eventually, it should finish, if not within the 2 days.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Reply all
Reply to author
Forward
0 new messages