CircRNA false positives?

55 views
Skip to first unread message

Zoe Ward

unread,
Mar 26, 2018, 6:10:17 PM3/26/18
to rna-star
Hi Alex,

I have created a script to run STAR on my samples looking at both mRNAs and circRNAs. 
I ran the script on some samples where the libraries were polyA enriched (as I wa sonly looking at the mRNAs) but I had left in my script the command '--chimSegmentMin 10' during the second mapping.
I noticed from the Log.final.out stats that I was getting about 5% of the reads coming out as Chimeric reads eg:

                         Number of input reads |       64320253
                      Average input read length |       177
                                    UNIQUE READS:
                   Uniquely mapped reads number |       55240784
                        Uniquely mapped reads % |       85.88%
                          Average mapped length |       175.14
                       Number of splices: Total |       20634301
            Number of splices: Annotated (sjdb) |       20464285
                       Number of splices: GT/AG |       20429756
                       Number of splices: GC/AG |       180974
                       Number of splices: AT/AC |       14807
               Number of splices: Non-canonical |       8764
                      Mismatch rate per base, % |       0.30%
                         Deletion rate per base |       0.01%
                        Deletion average length |       1.67
                        Insertion rate per base |       0.00%
                       Insertion average length |       1.44
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       4467520
             % of reads mapped to multiple loci |       6.95%
        Number of reads mapped to too many loci |       22404
             % of reads mapped to too many loci |       0.03%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       7.06%
                     % of reads unmapped: other |       0.07%
                                  CHIMERIC READS:
                       Number of chimeric reads |       3362030
                            % of chimeric reads |       5.23%



and when I ran your 'filterCirc.awk' script I get around 45000 coming out. These are all false positives right as we wouldn't expect any circRNAs in a polyA library.
I'm worried how high this number is for when I actually run the circRNA analysis.

Many thanks,  Zoe

Alexander Dobin

unread,
Mar 27, 2018, 5:20:30 PM3/27/18
to rna-star
Hi Zoe,

Even in the polyA+ library, the circRNAs may get cloned, if there is a substantially long stretch of As.
If you compare A+ and total libraries from the same sample, you should see much more circular reads in the latter one.
Also, --chimSegmentMin 10 is on the small side and might increase false positives.
You can filter the chimeric output with different min segments, and plot the number of circular reads as a function of the min segment - this will give you an idea for the cutoff to use.

Cheers
Alex

Zoe Ward

unread,
Apr 18, 2018, 7:32:48 PM4/18/18
to rna-star
Hi Alex,

As I already have the chimeric.sam files at --chimSegmentMin 10 (to save me re-running the mapping) is there a way of filtering these to only take reads that have a 20 base alignment (eg to only take the reads as if I had done the mapping with --chimSegmentMin 20?)

Alexander Dobin

unread,
Apr 19, 2018, 5:58:27 PM4/19/18
to rna-star
Hi Zoe,

this can be done by checking the alignment CIGARs in the Chimeric.out.junction file, e.g.:
chr22   32875263        +       chr22   32874967        +       1       1       0       D2FC08P1:235:C26F5ACXX:3:1108:13087:39997       32875177        86M15S  32874968        86S15M18p99M2S

You can see that one of the mates is split 86/15, so the minimum chimeric segment is 15.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages