Short genes vs. long genes expression

5 views
Skip to first unread message

Victor Ruotti

unread,
Jan 7, 2010, 11:23:30 AM1/7/10
to solexa
From the other forum.
Looking for any ideas you might have.

Hi there.
Just wanted to post this to see if anyone else see this interesting
pattern. When taking the average expression (RPKMs) of short genes,
say 0.5kb to 2.5kb, we see that the average is much higher when
comparing to the average of longer genes, say 10.5-50.5kb. We have
done a few wet lab work and were not able to find an answer to this
pattern.

We have also took a few samples from the Mortazavi's data and were
able to replicate the same pattern. This is telling us that for some
reason, when looking at the entire RNA Seq run, we see a higher number
of reads mapping to short transcript and lower number of reads mapping
to longer transcript. The question is why? You expect to see similar
averages, especially when you know that you are using the random
primed protocol.

Do you see the same bias towards short transcripts in your RNA SEQ
runs?
Thanks in advance.
Victor

cwse...@gmail.com

unread,
Jan 7, 2010, 1:48:55 PM1/7/10
to Victor Ruotti, >
I think Elizabeth Purdom (UC Berkeley Stat department) presented a poster on this at the last MGED meeting in Phoenix. She saw the effect, and came up with a way to correct the RPKM values. I say "I think" because I remember talking to her about it but don't have the abstract book with me...and my memory...well never mind that. If I find the book and it has more info in it, I'll post it.

-Chris Seidel

Ravi Gupta

unread,
Jan 8, 2010, 3:09:42 PM1/8/10
to sol...@googlegroups.com
Hi,

Check the following article.

Transcript length bias in RNA-seq data confounds systems biology

Alicia Oshlack email and Matthew J Wakefield email

Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Vic 3052, Australia

author email corresponding author email

Biology Direct 2009, 4:14doi:10.1186/1745-6150-4-14

The electronic version of this article is the complete one and can be found online at: http://www.biology-direct.com/content/4/1/14

Received: 9 April 2009
Accepted: 16 April 2009
Published: 16 April 2009

© 2009 Oshlack and Wakefield; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data.

Results

We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript.

Conclusion

Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses.


---
Regards

~Ravi


--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.


Reply all
Reply to author
Forward
0 new messages