Melissa,
Your point #2 is not a problem -- RNA fragmentation can be done prior
to 1st strand synthesis, the fragments will be primed with random
hexamers. It's been a month here, so you probably know by now that
Illumina's RNAseq protocol uses this technique. We have just done an
run in S. pombe, similar to Wilhelm et al.'s publication in June
(
http://www.ncbi.nlm.nih.gov/pubmed/18488015), but we followed the
Illumina 05/22/08 RNAseq protocol. Still, we got an obvious 3' bias
in the read distributions, although it gets diminished considerably
when viewed in logscale. This may or may not be related to the
protocol, as our RNA samples were a little messy, perhaps there were
lots of degraded mRNAs that produced a faux 3'-enrichment after poly-A
selection.
Also, the overall signal seemed low while background was rather high,
at least in comparison to a chipseq IP (or even an input). This may
be because this was our first time using this protocol. Or it may
reflect what Wilhelm et al. found, that apparently, >90% of the genome
is transcribed. But filtering the track files to remove this
background and viewing in logscale makes for very nice, sharply
defined read piles across the length of the coding regions.
What is curious is that we picked up pronounced AT frequency artifacts
at the beginning and ends of the reads, while the middles looked
normal. Even though our alignments look good, and the QC fraction was
low, the artifact is clearly detectable across lanes and alignment
codes (U012, R012, NM, QC) alike. We got 28M reads in 13M unique
sequences from 3 lanes. This was something of a "test" run on non-
phiX material since we've just upgraded to the GA2/IPAR/Paired-Ends
setup, so I'm not sure if the artifact is machine-based or related to
the protocol (i.e. the RNA shattering reaction shows sequence
biases). I'd like to examine other datasets produced by this protocol
for artifacts.
Ariel