Normalization

136 views
Skip to first unread message

Melissa Wong

unread,
Sep 22, 2008, 2:16:27 AM9/22/08
to solexa
Hi,

Has anyone done cDNA normalization for RNA-seq?
I want to send normalized cDNA for sequencing but I'm not sure how to
do it.
Normalization was successfully done for 454 sequencing. So how is it
different with RNA-seq?

The only kit available is EVROGEN TRIMMER cDNA normalization kit.
Recommended cDNA sythesis kit that synthesize cDNA using oligo d(T)
must be used together with this kit.
The cDNA used for normalization must be flanked by certain adapter
sequences, which is necessary for subsequent PCR amplification.
I see two problem in having unbiased representation of the sequence
fragments.
1) The cDNA synthesized using oligo d(T) will introduce 3' bias.
2)PolyA+ RNA cannot be fragmented before 1st strand synthesis. mRNA
fragmentation is recommended by Mortazavi et al 2008 (http://
www.ncbi.nlm.nih.gov/pubmed/18516045) to reduce the 5' bias.

Anyone has any suggestion? Thanks in advance.

Melissa

Ariel Paulson

unread,
Oct 28, 2008, 12:03:36 PM10/28/08
to solexa
Melissa,

Your point #2 is not a problem -- RNA fragmentation can be done prior
to 1st strand synthesis, the fragments will be primed with random
hexamers. It's been a month here, so you probably know by now that
Illumina's RNAseq protocol uses this technique. We have just done an
run in S. pombe, similar to Wilhelm et al.'s publication in June
(http://www.ncbi.nlm.nih.gov/pubmed/18488015), but we followed the
Illumina 05/22/08 RNAseq protocol. Still, we got an obvious 3' bias
in the read distributions, although it gets diminished considerably
when viewed in logscale. This may or may not be related to the
protocol, as our RNA samples were a little messy, perhaps there were
lots of degraded mRNAs that produced a faux 3'-enrichment after poly-A
selection.

Also, the overall signal seemed low while background was rather high,
at least in comparison to a chipseq IP (or even an input). This may
be because this was our first time using this protocol. Or it may
reflect what Wilhelm et al. found, that apparently, >90% of the genome
is transcribed. But filtering the track files to remove this
background and viewing in logscale makes for very nice, sharply
defined read piles across the length of the coding regions.

What is curious is that we picked up pronounced AT frequency artifacts
at the beginning and ends of the reads, while the middles looked
normal. Even though our alignments look good, and the QC fraction was
low, the artifact is clearly detectable across lanes and alignment
codes (U012, R012, NM, QC) alike. We got 28M reads in 13M unique
sequences from 3 lanes. This was something of a "test" run on non-
phiX material since we've just upgraded to the GA2/IPAR/Paired-Ends
setup, so I'm not sure if the artifact is machine-based or related to
the protocol (i.e. the RNA shattering reaction shows sequence
biases). I'd like to examine other datasets produced by this protocol
for artifacts.

Ariel

james@cancer

unread,
Oct 29, 2008, 4:37:57 PM10/29/08
to solexa
Hi,
We also just ran the Illumina mRNAseq protocols as part of the beta
testing programme. The fragmentation of mRNA seemed to work fine but I
would not want to comment too much on the quality and usability of the
data as we are still getting to grips with analysis.
I do not think we saw the 3' bias.
We too see very nice data which we have been looking at in GB. Nice
piles of sequences across coding regions, slicejuction reads,
unannotated reads, etc.
We also saw some strange artefacts in the sequence data. I have
copied two pictures to the Files section of the group (mRNAseq IVC 1
(lanes 2-5) and 2 (lanes 1-7)) that seem to show all mRNAseq we have
done, on four samples across 13 sample preps, have a low complexity
region at the start. This did not seem to affect alignment and we are
not sure what is going on yet.
We got 7-10M reads per lane.
James.

Wilhelm Brian

unread,
Oct 29, 2008, 4:53:42 PM10/29/08
to sol...@googlegroups.com
Hi Ariel, Melissa,

If we would have had more time, we would have tried to fractionate the
RNA prior to RT in order to try and maintain strand specificity, but as
we were some what limited in terms of our access to the Solexa machine
at the time, we opted to not try and either normalize the cDNA source
(assuming the depth of coverage for something like S pombe would
overcome any expression bias) or experiment with RNA fractionation.
With the oligo dT priming you will definitely see a 3' bias, however
there are kits (Ribominus from Invitrogen, etc) for removing rRNA to
make use of random primers for RT more productive. Even with two rounds
of polyA enrichment and oligo dT priming, we still have 20-30% of our
reads corresponding to rRNA genes.

RNA quality will likely be an issue for seeing clear 5' ends, but this
is somewhat dependent on the conditions under which the RNA is
collected. Some growth conditions are just messier...

The background transcription is almost certainly real, but also probably
not biologically meaningful (at least in the short term), although
opinions vary on this. Filtering them out for visualization is probably
a good idea.

Brian Wilhelm
Reply all
Reply to author
Forward
0 new messages