So, I have my regular transcriptome (human) and Salmon will run with 100 bootstraps in a very reasonable amount of time (30-60 mins). I have about 10M fragments in this experiment.
For an experiment, I decided to make a new fasta file transcriptome that had all of the human transposable elements (TE). Obviously, this greatly increases the number of transcripts in the transcriptome, as well as increases the overall similarity of the transcriptome (as they are repetitive sequences). I was expecting the output to not work well, and perhaps I would filter the transcriptome based on what TE transcripts have coverage. But my experiment seems to have failed much earlier than I expected, as Salmon is still processing the first sample (of 36), days later. It seems to have gotten stuck after this stage:
[2017-01-17 14:49:54.193] [stderrLog] [info] Done loading index
It is also worth noting I am using the 0.6.1 version of Salmon. I am not sure if the newer one will make a difference.
Is the reason this is failing because I have just made the transcriptome too complex in terms of number of transcripts? Or maybe the sequence similarity of the many new transcripts is an issue. I realise what I did was foolish, but I am curious of why and if there is a way around this.
Best,
James