Does Salmon run time scale linearly with transcriptome size?

James Lloyd

unread,

Jan 19, 2017, 5:31:07 PM1/19/17

to Sailfish Users Group

So, I have my regular transcriptome (human) and Salmon will run with 100 bootstraps in a very reasonable amount of time (30-60 mins). I have about 10M fragments in this experiment.

For an experiment, I decided to make a new fasta file transcriptome that had all of the human transposable elements (TE). Obviously, this greatly increases the number of transcripts in the transcriptome, as well as increases the overall similarity of the transcriptome (as they are repetitive sequences). I was expecting the output to not work well, and perhaps I would filter the transcriptome based on what TE transcripts have coverage. But my experiment seems to have failed much earlier than I expected, as Salmon is still processing the first sample (of 36), days later. It seems to have gotten stuck after this stage:

[2017-01-17 14:49:54.193] [stderrLog] [info] Done loading index

It is also worth noting I am using the 0.6.1 version of Salmon. I am not sure if the newer one will make a difference.

Is the reason this is failing because I have just made the transcriptome too complex in terms of number of transcripts? Or maybe the sequence similarity of the many new transcripts is an issue. I realise what I did was foolish, but I am curious of why and if there is a way around this.

Best,

James

James Lloyd

unread,

Jan 19, 2017, 5:36:58 PM1/19/17

to Sailfish Users Group

Oh, after a little more searching, someone else tried something similar:

https://groups.google.com/forum/#!topic/sailfish-users/d3iSonn4nFU

What should I do next? Is the bug fixed in the newest release and should I just try that out now?

Best,

James

Rob

unread,

Feb 21, 2017, 2:23:24 PM2/21/17

to Sailfish Users Group

Hi James,

The bug in the issue you mention has been fixed a while ago (and, in fact, should have been fixed in the version you're using). However, many changes and fixes have been made from v0.6.1 to v0.8.0 (the current release). Could you see if this problem persists with the current release, or if it is resolved? If it persists, I'd be happy to take a look and try to re-produce it on your data. As a bonus, building large indices with v0.8.0 should be considerably more space and time-efficient than before.