Rsem to small RNAseq?

631 views
Skip to first unread message

Xu Le

unread,
Nov 29, 2011, 10:53:12 PM11/29/11
to RSEM Users
the new vision is set to skip reads less than 25nt.but I want to
deal with small RNAseq data,ranging from 15~30nt.I want to use RSEM to
figure out gene expression.Can I do that?

Bo Li

unread,
Nov 29, 2011, 10:55:48 PM11/29/11
to rsem-...@googlegroups.com
Hi Le,

Is there any reason to have such short reads? For example, are you
studying about microRNAs?

Best,
Bo

Xu Le

unread,
Nov 29, 2011, 11:44:00 PM11/29/11
to RSEM Users
Hi,Bo
yes,you are right.I am studying plant miRNA and siRNA.I am trying
a method that I use RSEM to generate rsem.transcripts.fa and bowtie
index files(eg.rsem.1.ebwt···),next I use bowtie independently to
align these small reads to the index RSEM generates.Then put these
mapping results(.sam) to RSEM to continue the second step to calculate
gene expression.I am trying this method,wondering if it works.Do you
think it's right?

best regards,
Le

Bo Li

unread,
Nov 30, 2011, 12:30:23 AM11/30/11
to rsem-...@googlegroups.com
Hi Le,

No. RSEM will filter out all reads with length < 25bp...

So in your case, you will not add poly(A) tails, am I right?

However, you can do the following:

1) go to utils.h and change OLEN = 25 to OLEN = 15
2) go to 'rsem-calculate-expression' and replace this line

pod2usage(-msg => "Seed length should be at least 25!\n", -exitval => 2,
-verbose => 2) if ($L < 25);

with

#pod2usage(-msg => "Seed length should be at least 25!\n", -exitval =>
2, -verbose => 2) if ($L < 25);

3) type the following commands to recompile

make clean
make

4) run rsem-calculate-expression with bowtie aligner enabled. you need
to set --seed-length to 15

Please have a try. I'll consider allowing smaller seed length in the future.

Xu Le

unread,
Nov 30, 2011, 12:40:24 AM11/30/11
to RSEM Users
Hi Bo
thanks for your help in time.I have tried the method "invented" by
me and as you say,it failed.I will try your idea now.God bless me!
the result will come soon.^ ^

Le

Xu Le

unread,
Nov 30, 2011, 12:57:02 AM11/30/11
to RSEM Users
Hi Bo
very sorry to tell you that your modification somehow make a
difference,but not totally.this time the program skip reads<15,but the
following result comes out:

estimateFromReads, N1 finished.
rsem-run-em: SingleQModel.h:469: void SingleQModel::calcMW():
Assertion `seedLen >= OLEN && (mld == __null ? gld->getMinL() :
mld->getMinL()) >= seedLen' failed.
rsem-run-em failed! Please check if you provide correct parameters/
options for the pipeline!

as if there still a need to further change some script.Can you
give a hand?
Thanks.I think RSEM is a promising software dealing with RNAseq.
best,
Le

b...@cs.wisc.edu

unread,
Nov 30, 2011, 1:29:09 AM11/30/11
to rsem-...@googlegroups.com
Hi Le,

Have you set both OLEN & seedLength as 15?

Best,
Bo

Xu Le

unread,
Nov 30, 2011, 3:20:57 AM11/30/11
to RSEM Users
Hi Bo
something to do a moment ago,sorry to reply late.in the method I
"invented",I put the .sam file generated from Bowtie use -S
option,command as follow:

bowtie -v0 -S -t -p10 --best /data/apps/bowtie-0.12.7/indexes/
TAIR10 SRR013343.fastq.clipped SRR013343.sam

For I use a --sam to input the mapping result from independly
Bowtie(not the RSEM Bowtie),so I just change the OLEN scripts you
mentioned,regardless of seedLength.In fact,I set -v mode to align
plant small RNAseq by Bowtie.after all,I use the command:

rsem-calculate-expression -p5 --sam SRR013343.sam /data/luke/luke/
RSEM/rsem SRR013343

as listed just now,the wrong information came out.The reason I use
another Bowtie rather than RSEM Bowtie is that there are a little more
defult settings in RSEM,though less compared by other similar
software(rSeq eg.).
When I change the seedlength to 15 and use the Bowtie in
RSEM,everything goes smoothly.
Thank you Bo a lot,and I will feedback to you further in the use of
RSEM.

best regards,
Le

Colin Dewey

unread,
Dec 1, 2011, 1:43:23 PM12/1/11
to rsem-...@googlegroups.com
Hi Le,

Is this data from a small RNA sequencing protocol, such as the standard small RNA protocol from Illumina? If so, you should be careful in using RSEM for this application. The RSEM model assumes a protocol that involves a fragmentation step, which allows for reads to start at many positions within the RNA sequence. The small RNA protocol typically does not have a fragmentation step (because the RNAs are already very short) and thus you get reads that always come from one end or the other of each RNA. The abundances (tau values) given by RSEM will not be correct in this case, although the expected count values may be somewhat accurate.

Best,
Colin

Xu Le

unread,
Dec 2, 2011, 10:09:00 AM12/2/11
to RSEM Users
Hi Pro.Colin
Thank you for your remind.I have check my data and as you say,they
are illumina in deed with no fragmentation.So I decide to use the
read_count to calculate the RPKM next step.My adviser wants me to use
rseq and rsem to treat multireads and compare the known miRNA gene
expression under the control.We want to get a useful method for
multireads and maybe become the ubiquitous methodology in our lab.But
now I have a new question:Does the rsem is only useful tool for genes
that have konwn isoform?For your EM algorithm paper in
bioinformatics(2010) is focus on isoforms,then add all isofrom
expression i.e. gene expression and in the use of RSEM it needs users
to input knownisofrom.txt.What happens when come to genes that have no
konwn isoforms?
Thanks a lot.
Best,
Le

> >>>>>>> figure out gene expression.Can I do that?- 隐藏被引用文字 -
>
> - 显示引用的文字 -

Colin Dewey

unread,
Dec 13, 2011, 5:17:13 PM12/13/11
to rsem-...@googlegroups.com
Hi Le,

If there is no fragmentation step, you will not want to use a method that normalizes by transcript length. Similarly, you will not want to compute RPKM, which is a length-normalized measure. To handle multireads, you'll want a method like those used for DGE. Here is a related paper:
http://www.springerlink.com/content/q61271648821g33m/

I'm not entirely sure what you are asking about genes with no known isoforms. I think what you are asking is "can RSEM be used if you do not know which transcripts are isoforms of the same gene?" If so, then the answer is definitely "yes." If you do not provide provide a mapping from transcripts to genes, then RSEM just treats each transcript as belonging to its own gene.

Best,
Colin

Reply all
Reply to author
Forward
0 new messages