Hi there.
Just wanted to post this to see if anyone else see this interesting
pattern. When taking the average expression (RPKMs) of short genes,
say 0.5kb to 2.5kb, we see that the average is much higher when
comparing to the average of longer genes, say 10.5-50.5kb. We have
done a few wet lab work and were not able to find an answer to this
pattern.
We have also took a few samples from the Mortazavi's data and were
able to replicate the same pattern. This is telling us that for some
reason, when looking at the entire RNA Seq run, we see a higher number
of reads mapping to short transcript and lower number of reads mapping
to longer transcript. The question is why? You expect to see similar
averages, especially when you know that you are using the random
primed protocol.
Do you see the same bias towards short transcripts in your RNA SEQ
runs?
Thanks in advance.
Victor
Alicia Oshlack and Matthew J Wakefield
Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Vic 3052, Australia
author email corresponding author email
Biology Direct 2009, 4:14doi:10.1186/1745-6150-4-14
The electronic version of this article is the complete one and can be found online at: http://www.biology-direct.com/content/4/1/14
Received: | 9 April 2009 |
Accepted: | 16 April 2009 |
Published: | 16 April 2009 |
©
2009 Oshlack and Wakefield; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data.
We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript.
Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses.
--
You received this message because you are subscribed to the Google Groups "solexa" group.
To post to this group, send email to sol...@googlegroups.com.
To unsubscribe from this group, send email to solexa+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solexa?hl=en.