Fastest way to get transcript level counts

3,698 views
Skip to first unread message

Eugene Bolotin

unread,
May 13, 2014, 7:24:54 PM5/13/14
to rna-...@googlegroups.com
Hi All,
I am thinking about using STAR to align 1500+ RNA-seq samples to human transcriptome, and getting somewhat accurate levels per transcript, as this is a survey type project. I only care about known transcripts, I do not wish to get any novel transcripts.  I am less concerned about getting the highest map quality and more concerned about speed. Is there a consensus on the fastest way to do this? bowtie/RSEM seems to well at about ~1 hour per sample, 40 million reads, in my trial run with 100gb of ram and 16 cores (sufficient). Tophat is crawling at 5-8+ hours (indefeasible).  I was thinking about using STAR/RSEM (some people here report success with this) or STAR/something else for additional speedup. Any suggestions?
Thanks, 
Eugene Bolotin

Shawn Driscoll

unread,
May 14, 2014, 4:12:49 PM5/14/14
to rna-...@googlegroups.com
I like STAR/RSEM or STAR/Express.  This is when I need alignments as well as expression though.  Now there is something new that can do what you need very quickly.

You get nearly identical output with a new tool called Sailfish which can complete a 40 million read sample in probably under 15 minutes. I've benchmarked it against express and RSEM and they are all very very similar and the speed improvement is significant.  STAR can map very fast but then RSEM or eXpress has to disambiguate 40 million reads.  Sailfish's version of mapping is by counting k-mers from reads against a k-mer index however they follow this up by creating some k-mer equivalence classes allowing them to significantly reduce the number of "reads" that have to be disambiguated during the EM stage which is why it goes so much faster than the other EM based tools.  Like I said...you probably wouldn't notice a difference in the output of these three tools in any kind of real functional application (like differential expression or clustering).  Give it a try!

pbczyd

unread,
May 14, 2014, 5:02:24 PM5/14/14
to rna-...@googlegroups.com

Dear all,

You may know this  " ... RNA-Skim uses less than 4% of the k-mers and less than 10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in less than 10 minutes per sample by using just a single thread on a commodity computer, which represents more than 100 speedup over the state of the art alignment based methods, while delivering comparable or higher accuracy." 


Regards,
Sheng

Shawn Driscoll

unread,
May 14, 2014, 5:22:45 PM5/14/14
to rna-...@googlegroups.com
Hey neat!  Thanks for sharing.  Sounds like with RNA-Skim we could quantify 32 samples at a time instead of a single sample with 32 threads with Sailfish and maybe even have them all done in the same time it takes Sailfish to finish that single sample.

Alexander Dobin

unread,
May 16, 2014, 12:09:10 AM5/16/14
to rna-...@googlegroups.com
Hi All,

within ENCODE we have assessed several quantification software (Cufflinks, FLUX, eXpress, RSEM, Sailfish). From the accuracy standpoint, RSEM was the best, and Sailfish was the worst. I personally believe that k-mer quantification cannot proivde reliable results especially for low-to-medium expressed genes. For ENCODE production we decided on the following protocol - transform genomic alignments to transcriptomic (STAR latest patch has this option however it's not tested very thoroughly yet) and stream them into RSEM.

Cheers
Alex

Shawn Driscoll

unread,
May 16, 2014, 2:55:13 AM5/16/14
to rna-...@googlegroups.com
Alex that's interesting, thanks for sharing.  In my own benchmarking I too have seen RSEM always come out on top in terms of accuracy usually by a noticeable margin.

If you don't mind I have a couple quick questions because what you are doing there is sort of backwards from what one would normally do with tools like these.  For these RSEM runs when you map to the genome what sort of multi-alignment allowance are you using?  Second, when you translate to transcriptome do you translate genome alignments that could be assigned to multiple transcripts into multiple alignments (i.e. one per transcript hit)?  Finally, why do you map to the genome and translate to transcriptome instead of mapping to the transcriptome in the first place?  Do you just generally feel that genome alignments are more reliable or was this empirically determined to work better?

Thanks and as always thanks for STAR.

Shawn Driscoll

unread,
May 21, 2014, 3:13:45 PM5/21/14
to
Maybe another interesting observation. I just ran a 3 million read simulation and quantification with Sailfish vs RSEM.  For the RSEM pipeline I align to the transcriptome with STAR. Sailfish took about 9 minutes and the RSEM-STAR pipeline took about 9 minutes and was also *sligntly* more accurate. Additionally if I convert the RSEM assigned alignments to genomic and then count hits against a GTF the result is also very accurate...essentially identical to the control counts and the RSEM estimated counts.  The comparison was done with gene locus level count summaries (not isoform level counts).

The fact is with STAR around we don't actually need to speed up or avoid the mapping stage.

EDIT: revised statement about RSEM vs Sailfish accuracy in this benchmark.

pbczyd

unread,
May 19, 2014, 6:14:19 PM5/19/14
to rna-...@googlegroups.com

Hi Alex,
could you please explain a little bit about  how to set STAR for RSEM ? 
by adding setting  --quantMode TranscriptomeSAM when doing genome mapping ? 

Are there any other settings in STAR we should consider if we want to use STAR+RSEM (for human and mouse data, 100bp paired-end)  ?


thank you for your reply.

Regards,
Sheng

Alexander Dobin

unread,
May 19, 2014, 10:14:25 PM5/19/14
to rna-...@googlegroups.com
Hi Shawn,

thanks for this benchmark, we have not  been concerned with the speed, but it's good to know it's competitive.

To answer your questions:
>>> For these RSEM runs when you map to the genome what sort of multi-alignment allowance are you using?  Second, when you translate to transcriptome do you translate genome alignments that could be assigned to multiple transcripts into multiple alignments (i.e. one per transcript hit)?
We allow up to 20 genomic loci for multi-mappers. When genomic coordinates are translated into transcriptomic, there is no limit for multi-mappers - say if a unique alignment to the genome can be assigned to 100 transcripts, all of those will be reported.
>>> Finally, why do you map to the genome and translate to transcriptome instead of mapping to the transcriptome in the first place?  Do you just generally feel that genome alignments are more reliable or was this empirically determined to work better? 
The problem with mapping to the transcriptome is that we force alignments into a limited reference, Imagine a read that can map with a few mismatches to transcriptome, but without any mismatches to the genome - that would create a false gene expression call for the transcriptome-only mapping. By comparing the two methods we found that there are many hundreds of genes which transcriptome-only mapping will call expressed, and genomic mapping will assign no expression. Note, that what we call mapping to the genome, actually utilizes transcriptome information through the --sjdb option, so effectively we are mapping simultaneously to genome+transcriptome.
There are some minor details we still have to iron out, for example, it appears that genome+transcriptome mapping has ~100 more pseudogenes expressed, those are likely to be mis-mapping artifacts.

Cheers
Alex

On Friday, May 16, 2014 3:02:36 PM UTC-4, Shawn Driscoll wrote:
Maybe another interesting observation. I just ran a 3 million read simulation and quantification with Sailfish vs RSEM.  For the RSEM pipeline I align to the transcriptome with STAR. Sailfish took about 9 minutes and the RSEM-STAR pipeline took about 9 minutes and was also much more accurate. Additionally if I convert the RSEM assigned alignments to genomic and then count hits against a GTF the result is also very accurate...essentially identical to the control counts and the RSEM estimated counts.  The comparison was done with gene locus level count summaries (not isoform level counts).

The fact is with STAR around we don't actually need to speed up or avoid the mapping stage.

On Thursday, May 15, 2014 11:55:13 PM UTC-7, Shawn Driscoll wrote:

Rob Patro

unread,
May 20, 2014, 7:42:48 PM5/20/14
to rna-...@googlegroups.com
Hi Alex,

This is interesting. Are you able to share some of this test data? We actually have two new modes in the development branch of Sailfish that we're testing. One that assigns kmers in groups to increase quantification accuracy with longer reads and a second that actually accepts alignments. I'd be very interested in seeing if either or both of these methods close the accuracy gap you're seeing on your data. The goal is for Sailfish to be both fast and accurate, and I'm interested in anything that can help us achieve that goal.

--Rob

Rob Patro

unread,
May 20, 2014, 7:47:24 PM5/20/14
to rna-...@googlegroups.com
Hi Shawn,

Are you able to share this test data? As I mentioned above in my response to Alex, I'd like to run some of these tests with our current Sailfish improvements to see if we can close any accuracy gap. Also, our read-based inference should still be significantly faster than RSEM's.

--Rob

Rob Patro

unread,
May 20, 2014, 8:18:08 PM5/20/14
to rna-...@googlegroups.com
One other point related to my previous post. For such small data sets (3M reads), one wouldn't expect to see much speed difference between methods. Where RSEM inference really becomes slow is when many alignments have to be processed repeatedly over many EM rounds. Larger read sets are where methods like eXpress and Sailfish will see the largest speed gains.

Cheers,
Rob

Jonathan Keats

unread,
May 21, 2014, 2:49:39 AM5/21/14
to rna-...@googlegroups.com
I was wondering about this issue.  All our "real" datasets are >50 million paired-end and often over 100.  Hence our interest in STAR and more recently SAILFISH.  Long-term accuracy is a top priority but computation time is a limitation for projects of the scale I run.

Shawn Driscoll

unread,
May 21, 2014, 3:15:48 PM5/21/14
to rna-...@googlegroups.com
Hi Rob,

Sorry I meant to reply back to this thread earlier. The separation of the STAR+RSEM counts and the Sailfish counts in this case was not much.  I revised my post above.  In think in this case also the speed difference was not much because I also expressed almost all of the genes in the transcriptome which I assume makes a lot more work for the EM algorithm.  I can send you data - it's only simulated reads...maybe similar to what you generate with the FLUX simulator?  I always assumed those simulators would generate reads AND control counts for the transcripts it sampled from.

Alexander Dobin

unread,
May 22, 2014, 12:01:05 AM5/22/14
to rna-...@googlegroups.com
Hi Rob,

these comparisons were done by other people, I will have to ask them if they would be willing to share the results - sorry about that. I will let you know once I hear from them.

Cheers
Alex

Alexander Dobin

unread,
May 22, 2014, 12:04:21 AM5/22/14
to rna-...@googlegroups.com
Hi Sheng,

you can try STAR/RSEM combination as follows.

You will need to re-generate the STAR genome to run the transcriptome transformation. At the mapping stage, you need to add --quantMode TranscriptomeSAM. 

Note that this transformation happens simultaneously with mapping. The transcriptomic alignments are streamed into AlignedToTranscriptome.out.bam file, in addition to the normal alignments in Aligned.out.sam . At the moment the transcriptomic alignments are geared towards RSEM: indels or soft-clipping are not allowed.

You can run STAR and RSEM at the same time through a fifo file like this:

 

mkfifo AlignedToTranscriptome.out.bam

 

STAR --genomeDir /path/to/genome/ --readFilesIn Read1.gz Read2.gz --outSAMattributes NH   HI      --outFilterMultimapNmax 20   --outFilterMismatchNmax 999   --outFilterMismatchNoverLmax 0.04   --alignIntronMin 20   --alignIntronMax 1000000   --alignMatesGapMax 1000000   --alignSJoverhangMin 8   --alignSJDBoverhangMin 1 --quantMode TranscriptomeSAM --runThreadN 12  --readFilesCommand zcat &

 

rsem-calculate-expression -p 12 --bam --paired-end --no-bam-output --forward-prob 0 --estimate-rspd AlignedToTranscriptome.out.bam /path/to/RSEM/reference RSEM >& Log.rsem


This is still quite experimental and I need to do more thorough testing.


Cheers

Alex


Olga Botvinnik

unread,
Jul 15, 2014, 8:58:36 PM7/15/14
to rna-...@googlegroups.com
Hi Alex,
Thank you for all your hard work in creating STAR and thoroughly testing RNA-Seq quantification. With the "--quantMode TranscriptomeSAM" option, is it also possible to output the genome alignment as well, for alternative splicing analysis?
Thanks,
Olga

Vladimir Kuryshev

unread,
Jul 16, 2014, 11:43:53 AM7/16/14
to rna-...@googlegroups.com
it should be  there:
"..The transcriptomic alignments are streamed into AlignedToTranscriptome.out.bam file, in addition to the normal alignments in Aligned.out.sam .
"

v.

Olga Botvinnik

unread,
Jul 16, 2014, 5:51:06 PM7/16/14
to Vladimir Kuryshev, rna-...@googlegroups.com
Perfect, thanks Vladimir!

---
Olga Botvinnik
PhD Program in Bioinformatics and Systems Biology
Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine
University of California, San Diego
wwwblog | github | twitter | linkedin


--
You received this message because you are subscribed to a topic in the Google Groups "rna-star" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rna-star/ASsO340hlug/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

Alexander Dobin

unread,
Jul 18, 2014, 5:57:54 PM7/18/14
to rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu
Hi Olga, Vladimir,

Vladimir is right - STAR can output both the genomic and transcriptomic alignments at the same time.
By default, for now STAR output genomic SAM, however, you can change that with 
--outSAMtype BAM Unsorted 
or  
--outSAMtype BAM SortedByCoordinate

Cheers
Alex
Perfect, thanks Vladimir!
To unsubscribe from this group and all its topics, send an email to rna-star+unsubscribe@googlegroups.com.

Vladimir Kuryshev

unread,
Jul 21, 2014, 9:19:37 AM7/21/14
to rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu
Dear Alex,

thanks for introducing a new useful feature like "outSAMtype".

is there any way to see its short description or to know all  available options?
unfortunately, your latest manual (2.3.01) is out-of-date and there is no any command line help  (--help) integrated to  STAR.
or I'm searching in wrong places?

currently I'm testing STAR z13 patch and would like to challenge your great "toy" with different conditions ;)

thanks.

Vladimir
Perfect, thanks Vladimir!
To unsubscribe from this group and all its topics, send an email to rna-star+u...@googlegroups.com.

Tim Triche, Jr.

unread,
Jul 21, 2014, 2:48:55 PM7/21/14
to rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu
look in parametersDefault for all of them and explanatinos

my problem with the recent builds is just that they segfault... I would be so stoked to use these options... but alas I have regressed to the release :-(

Vladimir Kuryshev

unread,
Jul 22, 2014, 5:09:04 AM7/22/14
to rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu
thanks Tim!
some how I missed that file..

v.

Alexander Predeus

unread,
Jul 22, 2014, 3:05:45 PM7/22/14
to rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu


On Monday, July 21, 2014 2:48:55 PM UTC-4, Tim Triche, Jr. wrote:

my problem with the recent builds is just that they segfault... I would be so stoked to use these options... but alas I have regressed to the release :-(

Tim, did you try STARstatic executable? It might solve your problems. 

cheers 

-- Alex  

Tim Triche, Jr.

unread,
Jul 22, 2014, 4:44:19 PM7/22/14
to Alexander Predeus, rna-...@googlegroups.com, vkur...@yahoo.com, obot...@ucsd.edu
I did, alas it crashes as well.  I need to try debugging the compilation


Statistics is the grammar of science.


--

Alexander Predeus

unread,
Jul 24, 2014, 1:49:20 AM7/24/14
to rna-...@googlegroups.com
Alex, so does one still need to use "end-to-end" alignment mode to produce the bam file compatible with RSEM? I'm asking because I don't see it in the command that you've provided for Sheng.

> STAR --genomeDir /path/to/genome/ --readFilesIn Read1.gz Read2.gz --outSAMattributes NH   HI      --outFilterMultimapNmax 20   --outFilterMismatchNmax 999   --outFilterMismatchNoverLmax 0.04   --alignIntronMin 20   --alignIntronMax 1000000   --alignMatesGapMax 1000000   --alignSJoverhangMin 8   --alignSJDBoverhangMin 1 --quantMode TranscriptomeSAM --runThreadN 12  --readFilesCommand zcat &

cheers

-- Alex

Alexander Predeus

unread,
Jul 24, 2014, 1:53:32 AM7/24/14
to rna-...@googlegroups.com
And also, is it normal that BAM outputs aligned to genome and to transcriptome have a different number of lines? Not sure exactly how the process works...

Erik Aronesty

unread,
Jul 24, 2014, 9:24:38 AM7/24/14
to rna-...@googlegroups.com
Yes, because transcriptome alignments will contain multiple alignments for the same exon in different splice-forms.   

Alexander Dobin

unread,
Jul 24, 2014, 4:52:13 PM7/24/14
to rna-...@googlegroups.com, pre...@gmail.com, vkur...@yahoo.com, obot...@ucsd.edu, ttr...@usc.edu
Hi Tim,

you need to re-generate the genome before running STAR with  --quantMode TranscriptomeSAM, otherwise STAR will seg-fault (I need to throw an error message of course).
If this does not help, please send me a small example where STAR seg-faults.

Cheers
Alex
To unsubscribe from this group and all its topics, send an email to rna-star+unsubscribe@googlegroups.com.

Alexander Dobin

unread,
Jul 24, 2014, 5:05:06 PM7/24/14
to rna-...@googlegroups.com
Hi Alex,

by default STAR produces a RSEM-compatible Aligned.toTranscriptome.out.bam with
(i) read with indels discarded
(ii) soft-clips extended to the ends
I am planning to introduce an option to allow for soft-clipping and indel in the transcriptomic output.

Cheers
Alex

Alexander Dobin

unread,
Jul 24, 2014, 5:08:27 PM7/24/14
to rna-...@googlegroups.com
Erik is right, this is the main effect that increases the number of alignments.
On the other hands, a few alignments will be dropped - those that contain indels, as well as soft-clipping that cannot be extended to the ends with a defined number of mismatches.

Erik Aronesty

unread,
Jul 25, 2014, 1:45:01 PM7/25/14
to rna-...@googlegroups.com
Yeah this option is useful for express (http://bio.math.berkeley.edu/eXpress/overview.html), which makes good use of the indel/clipped alignments.   I only wigh express would take, as input, a genomic BAM and a GTF file.   No reason it can't.

Alexander Predeus

unread,
Jul 28, 2014, 7:01:11 PM7/28/14
to rna-...@googlegroups.com
Hello all, 

I realize this is mostly rsem question, but nobody is answering it in rsem group - maybe somebody here will have an idea? 

I was using the discussed pipeline (get transcriptome alignment with STAR, then calculate expression with rsem) to evaluate the expression of individual reads in strand-specific paired-end experiment. 

Basically I've had R1 and R2 aligned separately, and then used rsem on both. Thing is, the read that co-insides with the direction of the features (R2 in my case) is processed normally. 

R1, however, does not generate meaningful counts no matter what strandedness options (--forward-prob) I use in rsem. Here are some data (I've tried three options: --forward-prob of 0, 0.5, and 1.0):


run                                                         #of non-zero read counts           average read count             stdev read count 
test_p0.5_R1.genes.results                     1938                                                  8.97149                                   44.0204
test_p0.5_R2.genes.results                   11552                                                  58.0327                                    311.64
test_p0_R1.genes.results                               3                                                   4.66667                                   5.18545
test_p0_R2.genes.results                               2                                                      1.5                                             0.5
test_p1_R1.genes.results                        1938                                                     8.97033                                 44.0334
test_p1_R2.genes.results                      11565                                                     57.9754                                  311.48

In R2 alignment, most reads have aligned with the flag 256, with a small fraction being 272. 

In R1, the picture is exactly the opposite. 

When I switch flags (256/272) manually in aligned R1 file, the results are as expected (~ 12k expressed genes, with average and stdev similar to R2). 

Any ideas about how to handle this sort of issues? 

Thank you in advance!

Alexander Dobin

unread,
Aug 18, 2014, 3:21:47 PM8/18/14
to rna-...@googlegroups.com
Hi Alex,

not sure if you have already solved this, but I could not reproduce this problem. When I align R1 only, and then run RSEM with  --forward-prob 0 or  --forward-prob 0.5, I get gene counts >= those in PE run.
Most of the reads have indeed FLAG=272, which means they are mapped on the antisense of the genes, as it should be for R1 in the PE protocol. If you send me your BAM file, I can try to look into it.
What are the mapping stats for the R1 vs R2? Is it possible that R1 sequencing quality is much poorer?

Cheers
Alex

Alexander Predeus

unread,
Aug 19, 2014, 2:57:51 PM8/19/14
to rna-...@googlegroups.com
Hello, 

no I have not solved it. I've checked the mapping stats with rsem/bowtie combo as well as STAR/htseq pipeline, and they are definitely similar between R1 and R2 (and fairly well correlated between each other, for that matter). But somehow the R1 output just fails to work with rsem. It's all about the flag too - if I manually switch 256/272, it works just fine. 

I can share the files for sure - what is the good way to do it? Should I upload them somewhere? 

Thank you for your help again. 

-- Alex 

Alexander Dobin

unread,
Aug 20, 2014, 3:59:25 PM8/20/14
to rna-...@googlegroups.com
Hi Alex,

you could post the BAM file for R1 on google drive or dropbox.com
I think 1M reads should be enough. Please also post the RSEM command line.

Cheers
Alex

Alexander Predeus

unread,
Aug 21, 2014, 1:43:02 AM8/21/14
to rna-...@googlegroups.com
Alex, 

so here are files with 1M reads for R1 and R2:


commands I've used with rsem:

1. to generate the genome: 

rsem-prepare-reference -gtf ../Gencode/mouse_v2/gencode.vM2.all_exon.mm10G.gtf ../Gencode/mouse_v2/GRCm38.p2.genome.fa rsem_mm10G_gencode_vM2

2. to calculate coverage (as I mentioned above, I've tried three options, --forward-prob 0.0, 0.5, and 1.0):

rsem-calculate-expression -p 8 --bam --no-bam-output --forward-prob 0.0 --estimate-rspd <file>.bam <ref> <tag> 

Thank you in advance 

-- Alex 

Alexander Predeus

unread,
Sep 14, 2014, 6:31:06 PM9/14/14
to rna-...@googlegroups.com
Alex, did you by any chance looked at the files? 

I've been working on something else but I'm still interested in getting this to work. 

Thank you in advance! 

-- Alex Predeus

Alexander Dobin

unread,
Sep 16, 2014, 9:14:02 AM9/16/14
to rna-...@googlegroups.com
Hi Alex,

I think I have figured out what causes the problem. By default, RSEM adds poly-A tails to transcript sequences. This somehow messes up the calculations for reads that map to the antisense. I guess it has to do with different transcript lengths - I will ask Colin Dewey about it. When I generated RSEM reference with rsem-prepare-reference --no-polyA ..., the problem disappeared. Please give it a try and let me know whether it worked for you.

Cheers
Alex

Rob Patro

unread,
Sep 29, 2014, 9:40:42 PM9/29/14
to rna-...@googlegroups.com
Hi Alex,

  I was just curious if there was any update on this.  The reason I ask is because we've finally gotten around to releasing (at least a beta of) the software I mentioned above.  In keeping with the fish theme, we're calling it Salmon, and the linux executable is available here (and the source is in this branch of the Sailfish repo).  We introduced a new "lightweight" alignment model that is different from that used in Sailfish (but potentially more accurate & faster), but, perhaps more relevant to this forum, the new software is also capable of quantifying isoform abundance using pre-computed alignments (from a sam or bam file).  Actually, the speed of STAR was a big motivator for us implementing this functionality (and STAR is the aligner I'm suggesting to people asking about using Salmon with alignments).  Anyway, having access to some of this data would be great for us, because we'd like to discover the cases where our new approach outperforms Sailfish as well as cases where we might improve accuracy even further.

Thanks!
Rob

Rob Patro

unread,
Sep 29, 2014, 9:45:53 PM9/29/14
to rna-...@googlegroups.com
Hi Shawn,

  No problem.  I apologize for taking *even longer* to reply here.  I'd forgotten to click the "Email updates to me" button, and have been rather busy over the summer (moving, changing jobs, etc.).  Anyway, if you still have it around or can easily regenerate it, I'd be very interested in seeing your data.  In particular, I'd also be interested in testing it out with our new quantification program, Salmon, that I mention below.  Anyway, sorry again for the delayed reply.  I'm clicking to have updates e-mailed to me now so I don't miss any future responses ;P.

Best,
Rob

On Wednesday, May 21, 2014 3:15:48 PM UTC-4, Shawn Driscoll wrote:
Hi Rob,

Sorry I meant to reply back to this thread earlier. The separation of the STAR+RSEM counts and the Sailfish counts in this case was not much.  I revised my post above.  In think in this case also the speed difference was not much because I also expressed almost all of the genes in the transcriptome which I assume makes a lot more work for the EM algorithm.  I can send you data - it's only simulated reads...maybe similar to what you generate with the FLUX simulator?  I always assumed those simulators would generate reads AND control counts for the transcripts it sampled from.

On Tuesday, May 20, 2014 4:47:24 PM UTC-7, Rob Patro wrote:
Hi Shawn,

  Are you able to share this test data?  As I mentioned above in my response to Alex, I'd like to run some of these tests with our current Sailfish improvements to see if we can close any accuracy gap. Also, our read-based inference should still be significantly faster than RSEM's.

--Rob

Alexander Predeus

unread,
Oct 15, 2014, 11:37:33 PM10/15/14
to rna-...@googlegroups.com
Hello Alex, 

sorry for taking this long to reply. I just got around to testing it now. It did solve the problem, thank you very much! 
Reply all
Reply to author
Forward
0 new messages