I am running corset on my assembly and it appears to be stuck on clustering a specific cluster:
...
It took about 1 hour 20 mins to get to that stage, and after 2.5 days nothing has changed. I have read previous posts in this group about corset taking a while to run and have tried to follow the advice on these. My transcriptome is a long-read ISOseq assembly with 49,981 transcripts - I have carried out cdhit prior to running corset to remove some redundancy as well as transdecoder to cut down the no. of transcripts. I have 4 groups, each with 10 samples and am using the equivalence class info from running salmon mapping. I have added '-l 10' to corset to try and speed things up. Here is my code:
g="3,1,3,1,4,2,4,2,3,1,3,1,3,1,4,2,4,2,4,2,3,1,3,1,4,2,4,2,3,1,3,1,3,1,4,2,4,2,4,2"
n="PS506_CNS,PS506_eyes,PS508_CNS,PS508_eyes,PS509_CNS,PS509_eyes,PS510_CNS,PS510_eyes,PS513_CNS,PS513_eyes,PS514_CNS,PS514_eyes,PS518_CNS,PS518_eyes,PS522_CNS,PS522_eyes,$
eq_classes="PS506_CNS_eq_classes.txt PS506_eyes_eq_classes.txt PS508_CNS_eq_classes.txt PS508_eyes_eq_classes.txt PS509_CNS_eq_classes.txt PS509_eyes_eq_classes.txt PS510_$
corset -I \
-g ${g} \
-n ${n} \
-i salmon_eq_classes ${eq_classes} \
-l 10 \
-p isoseq3_cdhit0.99_transdecoder_nrmollusca_genes_corset
I'm expecting it to take 1 - 2 weeks to run based on previous posts. But just wanted to check if you think this seems realistic? Or because its been stuck at this stage for a couple days it might take a very long time? Would it be worth adding the '-x' parameter in? I hadn't added '-x' in yet as I wasn't sure if it would remove any useful info - but I see in a previous post you reccommend '-x 100'. I am working on a squid, which I would expect to have high levels of RNA editing so this may be a further complication.