Hi Alex,
So will this give me the number of reads per circular RNA:
awk -f circ.awk ADAR_15_Chimeric.out.junction | sort -k1,1 -k2,2n -k3,3n -k4,4 | uniq -c | less
where circ.awk is your script.
function cigarGenomicDist(cig)
{
n=split(cig,L,/[A-Z]/)-1;
split(cig,C,/[0-9]*/);
g=0;
for (ii=1;ii<=n;ii++) {//scan through CIGAR operations
if (C[ii+1]!="S" && C[ii+1]!="I") {
g+=L[ii];
};
};
return g;
};
BEGIN {
endTol=5;
};
{
if ( $7>=0 && $1==$4 && $3==$6 && (($3=="-" && $5>$2 && $5-$2<1000000) || ($3=="+" && $2>$5 && $2-$5<1000000)) )
{
#print $1,$2,$5,$3,$7,$8,$9;
#print $11,$11+cigarGenomicDist($12),$13,$13+cigarGenomicDist($14);
if ( ($3=="+" && $11+endTol>$5 && $13+cigarGenomicDist($14)-endTol<=$2) \
|| ($3=="-" && $13+endTol>$2 && $11+cigarGenomicDist($12)-endTol<=$5) ) {
print $1,($3=="+"?$5:$2),($3=="+"?$2:$5),($3=="+"?"-":"+"),($7==0?0:3-$7),$8,$9;
};
};
};
Also now that I understand column 7 and column 8, this means the sum of these 2 columns should be low. Is that already included as a part of your script??
Also Alex, after running the script on Chimeric.out.junction file I got this(showing you some lines)
chrUextra 4508643 4619131 - 2 2 0
chrUextra 4508643 4619131 + 1 0 2
Can you tell me what is 5th column above?? I guess 6th and 7th column are the same as 8th and 9th column in the junction file
Regards
Varun