Did I sequence mostly DNA?

jbo...@tgen.org

unread,

Apr 3, 2018, 5:12:28 PM4/3/18

to rna-star

Hi,

I tried running some pretty degraded samples through STAR to get an idea of the quality of my data. We'd done a DNase treatment, but had to do an RNA amplification to get enough material to sequence. Before amplification, RNA was not visible via bioanalyzer (nor was DNA). I'm afraid that if the DNase treatment wasn't 100% efficient that I may have mostly sequenced DNA. Can I get an idea of this via the STAR output files? I wonder if the number of splices is just so low that it must be DNA sequence data?

Here's a typical Log.final.out output:

Started job on | Mar 26 09:48:37

Started mapping on | Mar 26 09:50:07

Finished on | Mar 26 10:01:21

Mapping speed, Million of reads per hour | 78.58

Number of input reads | 14712597

Average input read length | 186

UNIQUE READS:

Uniquely mapped reads number | 10085545

Uniquely mapped reads % | 68.55%

Average mapped length | 182.80

Number of splices: Total | 7485

Number of splices: Annotated (sjdb) | 2614

Number of splices: GT/AG | 6920

Number of splices: GC/AG | 525

Number of splices: AT/AC | 23

Number of splices: Non-canonical | 17

Mismatch rate per base, % | 1.54%

Deletion rate per base | 0.02%

Deletion average length | 1.53

Insertion rate per base | 0.01%

Insertion average length | 1.65

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 572921

% of reads mapped to multiple loci | 3.89%

Number of reads mapped to too many loci | 128356

% of reads mapped to too many loci | 0.87%

UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00%

% of reads unmapped: too short | 25.95%

% of reads unmapped: other | 0.74%

CHIMERIC READS:

Number of chimeric reads | 111968

% of chimeric reads | 0.76%

And the number of lines in the SJ.out.tab file is 9501.

Thanks - I'm very new to STAR - this is my first run so I really have nothing to compare to. Thank you!

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

Alexander Dobin

unread,

Apr 3, 2018, 6:15:10 PM4/3/18

to rna-star

Hi @jbowers,

the number of splices in your library indeed seems very low and points to DNA.

However, the proportion of spliced reads depends on the way the library was made.

For a typical polyA+ or total RNA library with random hexamer RT, for 2x100b reads, you would expect the number of splices to be at least 1/2 of the unique mappers.

On the other hand, if you are sequencing just the 3' end of the RNAs (say with achored poly-dT primer), you may get mostly unspliced reads.

I think the easiest way to check whether this is DNA or RNA, is to make a wiggle browser track (e.g. use STAR options --outSAMtype BAM SortedByCoordinate --outWigType bedGraph) and look at the signal in the browser - RNA signal should concentrate around annotated genes, while DNA signal should spread uniformly through the genome.

Cheers

Alex

jbo...@tgen.org

unread,

Apr 4, 2018, 6:55:00 PM4/4/18

to rna-star

Okay great, this is really useful information. Unfortunately it's not good news for me but I really appreciate the answers! I'm attempting to set up so I can visualize the RNA signal in a genome browser. Thanks for that suggestion; I've never done that before.