CircRNA false positives?

Zoe Ward

unread,

Mar 26, 2018, 6:10:17 PM3/26/18

to rna-star

Hi Alex,

I have created a script to run STAR on my samples looking at both mRNAs and circRNAs.

I ran the script on some samples where the libraries were polyA enriched (as I wa sonly looking at the mRNAs) but I had left in my script the command '--chimSegmentMin 10' during the second mapping.

I noticed from the Log.final.out stats that I was getting about 5% of the reads coming out as Chimeric reads eg:

Number of input reads | 64320253

Average input read length | 177

UNIQUE READS:

Uniquely mapped reads number | 55240784

Uniquely mapped reads % | 85.88%

Average mapped length | 175.14

Number of splices: Total | 20634301

Number of splices: Annotated (sjdb) | 20464285

Number of splices: GT/AG | 20429756

Number of splices: GC/AG | 180974

Number of splices: AT/AC | 14807

Number of splices: Non-canonical | 8764

Mismatch rate per base, % | 0.30%

Deletion rate per base | 0.01%

Deletion average length | 1.67

Insertion rate per base | 0.00%

Insertion average length | 1.44

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 4467520

% of reads mapped to multiple loci | 6.95%

Number of reads mapped to too many loci | 22404

% of reads mapped to too many loci | 0.03%

UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00%

% of reads unmapped: too short | 7.06%

% of reads unmapped: other | 0.07%

CHIMERIC READS:

Number of chimeric reads | 3362030

% of chimeric reads | 5.23%

and when I ran your 'filterCirc.awk' script I get around 45000 coming out. These are all false positives right as we wouldn't expect any circRNAs in a polyA library.

I'm worried how high this number is for when I actually run the circRNA analysis.

Many thanks, Zoe

Alexander Dobin

unread,

Mar 27, 2018, 5:20:30 PM3/27/18

to rna-star

Hi Zoe,

Even in the polyA+ library, the circRNAs may get cloned, if there is a substantially long stretch of As.

If you compare A+ and total libraries from the same sample, you should see much more circular reads in the latter one.

Also, --chimSegmentMin 10 is on the small side and might increase false positives.

You can filter the chimeric output with different min segments, and plot the number of circular reads as a function of the min segment - this will give you an idea for the cutoff to use.

Cheers

Alex

Zoe Ward

unread,

Apr 18, 2018, 7:32:48 PM4/18/18

to rna-star

Hi Alex,

As I already have the chimeric.sam files at --chimSegmentMin 10 (to save me re-running the mapping) is there a way of filtering these to only take reads that have a 20 base alignment (eg to only take the reads as if I had done the mapping with --chimSegmentMin 20?)

Alexander Dobin

unread,

Apr 19, 2018, 5:58:27 PM4/19/18

to rna-star

Hi Zoe,

this can be done by checking the alignment CIGARs in the Chimeric.out.junction file, e.g.:

chr22 32875263 + chr22 32874967 + 1 1 0 D2FC08P1:235:C26F5ACXX:3:1108:13087:39997 32875177 86M15S 32874968 86S15M18p99M2S

You can see that one of the mates is split 86/15, so the minimum chimeric segment is 15.