circular RNA

Varun Gupta

unread,

Mar 3, 2015, 3:30:13 PM3/3/15

to

Hi Alex
Hope you are doing well.
I am interested in detecting circRNA's and I have read intial postings about circRNA's.

Can you explain me following columns from Chimeric.out.junction

Column 7: junction type: What does a value of 0 means(couldn't find it in the manual)
column 8 and column 9 and What does 1 means for them or 0 ?? Basically What does repeat length means??

Also I wrote this to find circ RNA's within Chimeric.out.junction

awk '$7 >= 0' Chimeric.out.junction | awk '$1 == $4' | awk '$3 == $6' | awk '($3=="-" && $5>$2 && $5-$2<1000000) || ($3=="+" && $2>$5 && $2-$5<1000000)'

How is it different from your filterCirc.awk script for circRNA. Can you explain the script and output. I am pasting the output of the script:

chr2R 4479797 5153637 - 0 0 0
chrUextra 22922236 22922262 - 0 0 0
chr3R 18610660 18612896 - 0 0 0
chrUextra 28292248 28318441 - 0 0 2

Also I need to count the number of reads supporting a circular RNA. How can I do it. This is because I am comparing 2 samples where I know one would have less cirRNA than other

Thanks a lot

Regards
Varun

Alexander Dobin

unread,

Mar 5, 2015, 10:40:08 AM3/5/15

to rna-...@googlegroups.com

Hi Varun,

Column 7: junction type=0 means that the junction is non-canonical (non GT/AG intron motif).

Columns 8 (90: micro-repeat length on the left (right): the number of bases the junction donor and acceptor sites can be shifted to the left (right) without changing the spliced product, i.e. ambiguity in the junction position.

My script checks whether the other mate alignment is consistent with the circular junction. Also, it takes into the account the strandedness of the dUTP protocol.

To get the number of reads per circular junction, you can `sort` the list and collapse it with `unique -c`, e.g.

$ sort -k1,1 -k2,2n -k3,3n -k4,4 List.txt | uniq -c

Cheers

Alex

Varun Gupta

unread,

Mar 5, 2015, 12:04:59 PM3/5/15

to

Hi Alex,
So will this give me the number of reads per circular RNA:

awk -f circ.awk ADAR_15_Chimeric.out.junction | sort -k1,1 -k2,2n -k3,3n -k4,4 | uniq -c | less
where circ.awk is your script.

function cigarGenomicDist(cig)
{
        n=split(cig,L,/[A-Z]/)-1;
        split(cig,C,/[0-9]*/);
        g=0;
        for (ii=1;ii<=n;ii++) {//scan through CIGAR operations
                if (C[ii+1]!="S" && C[ii+1]!="I") {
                        g+=L[ii];
                };
        };
        return g;
};
BEGIN {
        endTol=5;
};
{
if ( $7>=0 && $1==$4 && $3==$6 && (($3=="-" && $5>$2 && $5-$2<1000000) || ($3=="+" && $2>$5 && $2-$5<1000000)) )
{
    #print $1,$2,$5,$3,$7,$8,$9;
    #print $11,$11+cigarGenomicDist($12),$13,$13+cigarGenomicDist($14);
        if ( ($3=="+" && $11+endTol>$5 && $13+cigarGenomicDist($14)-endTol<=$2) \
          || ($3=="-" && $13+endTol>$2 && $11+cigarGenomicDist($12)-endTol<=$5) ) {
               print $1,($3=="+"?$5:$2),($3=="+"?$2:$5),($3=="+"?"-":"+"),($7==0?0:3-$7),$8,$9;
    };
};
};

Also now that I understand column 7 and column 8, this means the sum of these 2 columns should be low. Is that already included as a part of your script??

Also Alex, after running the script on Chimeric.out.junction file I got this(showing you some lines)

chrUextra 4508643 4619131 - 2 2 0
chrUextra 4508643 4619131 + 1 0 2

Can you tell me what is 5th column above?? I guess 6th and 7th column are the same as 8th and 9th column in the junction file

Regards
Varun

Alexander Dobin

unread,

Mar 10, 2015, 7:06:58 PM3/10/15

to rna-...@googlegroups.com

Hi Varun,

this script was designed for Illumina stranded TruSeq PE protocol, i.e. the strand of the 1st read is opposite to that of original RNA.

>>>Also now that I understand column 7 and column 8, this means the sum of these 2 columns should be low. Is that already included as a part of your script??

The total length of repeat, i.e. the sum of $8 and $9 columns in the Chimeric.out.junction should be low - these are the last two columns that the awk script outputs.

>>>Can you tell me what is 5th column above??

This is strand reverted (because of the protocol) column $7 from Chimeric.out.junction file, i.e. 1 for GT/AG, 2 for reverse complementary CT/AC, 0 - non-canonical (all other motifs).