DNA vs RNA quality control

long le

unread,

Mar 25, 2015, 5:08:24 PM3/25/15

to rseqc-...@googlegroups.com

Hi,

We are using RSeqC to get a sense of our total nucleic acid quality for a targeted RNA-Seq clinical assay. We are curious how we could better interpret the read distribution output to say whether a sample is mostly DNA, mostly RNA, or an equal mixture of RNA+DNA. It seems like the read distribution analysis preferentially tags reads that map to intron+cds_exon as CDS_Exons. Is that right? While reads that map only to introns are tagged as Introns? What about a paired end read where one of the pair maps to an intron and the other maps to an exon? Also, does RSeqC respect the mark duplicates flag for read alignments?

Any suggestions on what RSeqC output would be useful to say that a sample is rich in RNA vs. DNA would be very helpful to us. Clinically, we would fail a case with mostly DNA but pass something with lots of RNA or a good mix of RNA+DNA.

Thanks,

Long

Liguo Wang

unread,

Mar 26, 2015, 6:24:05 PM3/26/15

to rseqc-...@googlegroups.com

Hi Long,

please read our online manual (http://rseqc.sourceforge.net/), some of your questions are already answered there.

the read_distribution.py script is not a good choice if your question is "a sample is rich in RNA vs. DNA".

One simple solution is mapping your reads using tophat/STAR etc (but not BWA, as it does not support splicing mapping). Then count the (splice reads)%, the rational is for DNA samples, you should NOT observe any splice reads (at least theoretically). While for RNA-seq data, for a given read length, the (splice reads)% should be in certain range.

Liguo

--
You received this message because you are subscribed to the Google Groups "rseqc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseqc-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

long le

unread,

Mar 30, 2015, 2:22:45 PM3/30/15

to rseqc-...@googlegroups.com

Hi Liguo,

Thanks for your response. We are using BWA-MEM which actually can map split/spliced reads. So you would recommend bam_stat.py to distinguish between RNA reads vs. DNA reads? I also noticed that tin.py may also be useful to assess RNA quality.

Thanks,

Long

long le

unread,

Mar 30, 2015, 4:29:03 PM3/30/15

to rseqc-...@googlegroups.com

We just checked the results of junction_annotation.py and also bam_stat.py. It seems that RSeqC is not compatible with bams generated with BWA-MEM. The results show no spliced reads when there should be some in there.

Long

Reply all

Reply to author

Forward