Restart/Checkpoints

39 views
Skip to first unread message

Voll Korrn

unread,
Sep 12, 2023, 8:33:33 AM9/12/23
to corset-project
Hi there, 

I am using Corset to create superTranscripts to do SNP calling and DTU. 
I created TrinityAssemblies for two species, mapped my reads with Bowtie2 and ran Corset 1.07 (newer versions seem not to work) and start the Process with around 100 RNAseq Datasets with the corset-reads step before. 
However the program takes endless times, one Assembly with 190.000 Transcript took 23 Days on a 60GB ram node. The second assembly with 270.000 Transcripts is now after 23 Days only down to 230.000 Clusters. 

Is there any ways to speed up the process, are there checkoints, restart possibilities? Why is it so slow overall? 

Thank you very much, best, 
Raphael 


Nadia Davidson

unread,
Sep 21, 2023, 1:23:32 AM9/21/23
to corset-project
Hi Raphael,

You may need to do some filtering. Have a look at this thread for some tips, https://groups.google.com/g/corset-project/c/i9B10P8cLh0

100 RNA-Seq datasets is quite a lot and beyond what we typically saw when first building corset. As you plan on building superTranscripts, my suggestion is to subsample the RNA-Seq data (e.g. take a few % from each of the 100 RNA-Seq datasets) so that the clustering can be performed in a reasonable time. Once you have this you can build the superTranscripts and then map all reads (100%) back to those.

Good luck with your analysis.

Cheers,
Nadia.

Voll Korrn

unread,
Sep 23, 2023, 8:21:21 AM9/23/23
to corset-project
Hi Nadia, 

thanks so much for the response! 
I will try to go with your advice and post some updates. 
Just out of interest, is it planned to parallelise the clustering anywhen soon as datasets are getting large and larger now...? 

Thanks and all the best, 
Raphael 

Reply all
Reply to author
Forward
0 new messages