Re: Using STAR to detect circular RNA from RNASeq library

1,768 views
Skip to first unread message

Alexander Dobin

unread,
Apr 2, 2013, 10:55:33 PM4/2/13
to rna-...@googlegroups.com
Dear Bo,

this is a good and timely question. As you guessed, circular RNA are considered chimeric and are output into Chimeric.out.sam/junction files,
However, there was a problem with STAR detecting circRNA as well as "same-strand proximal chimeras", the latter problem reported by Nicolas Stransky.
This problem is fixed in the latest alpha-release:
I was able to detect what looks like circRNA, and I will be testing it further with the data from the Nature paper. Please let me know if it works for you.

There is a good discussion about chimeric detection and output in this thread:

Cheers
Alex



On Monday, April 1, 2013 10:17:09 PM UTC-4, Bo Han wrote:
Hi, Dear STAR developer, 

Thanks for developing this awesome tools. 

I am wondering whether STAR is (or in the future will be) able to detect circular RNA?

The algorithm used in this paper
is elegant and very similar to the idea in tophat. 
However, I believe that it would be more sensitive and faster if STAR allow that kind of splicing pattern. 

(PS:  Might chimeric output of STAR have such information?)

Thanks in advance, 
Sincerely, 
Bo

wenjie zhu

unread,
Aug 8, 2013, 9:31:19 PM8/8/13
to rna-...@googlegroups.com
Dear Alex
  
         Thanks for developing this awesome tools.
         I want to using  the STAR to detect the circular RNA from the RNAseq data,can you give me some suggestion,or the handbook about detect circular RNA
 I have download the data from the Nature paper, when mapped ,using the STAR, the mapping rate about 48%, the total look likes circular RNA only ~500,what's different about the Nature paper method. thanks


在 2013年4月3日星期三UTC+8上午10时55分33秒,Alexander Dobin写道:

Alexander Dobin

unread,
Aug 13, 2013, 12:20:13 PM8/13/13
to rna-...@googlegroups.com
Hi Wenjie,

to detect circular RNA you would need to switch on the chimeric output. You can use, for example, --chimSegmentMin 15   --chimJunctionOverhangMin 15.
You can extract circular RNA from Chimeric.out.junction file. I would filter it in the following way:
col7>=0: for junction spanning reads
AND
col1==col3: chimeric segments on the same chromosome
AND
col3==col6: chimeric segments on the same strand
AND
(col3=="-" && col5>col2 && col5-col2<1000000) || (col3=="+" && col2>col5 && col2-col5<1000000): for circular RNAs on + or - strand, and with donor and acceptor within 100kb from each other.

For the total RNA samples, low mapping rate may be caused by large rRNA content. I would recommend mapping their data to a genome containing scaffolds - that typically improves the mapping rates, though mostly by increasing the multi-mappers.

Cheers
Alex

Kipp A

unread,
Jul 9, 2014, 11:56:22 AM7/9/14
to rna-...@googlegroups.com
I'm having trouble using STAR to detect circular RNA in the data from Memczak et al (Nature 2013), specifically the HEK cells RNA.  Using the filters posted above I don't get any hits in my Chimeric.out.junction file.  Would anyone be willing to post their star run commands used on that dataset?  Any help would be greatly appreciated.  
My command:

STAR --genomeDir /path/hg19_Gencode14.overhang75 --sjdbGTFfile /path/Homo_sapiens.GRCh37.74.gtf --readFilesIn ../../SRR650317_1.fastq ../../SRR650317_2.fastq --runThreadN 25 --outSAMunmapped Within --chimSegmentMin 15  --chimJunctionOverhangMin 15 --outSAMstrandField intronMotif --outStd SAM | samtools view -bS - > aligned.bam

Thanks,
Kipp

Alexander Dobin

unread,
Jul 14, 2014, 2:48:55 PM7/14/14
to rna-...@googlegroups.com
Hi Kipp,

which STAR version have you used? Please use one of the latest patches from  http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARreleases/ 
The example of  a script to extract circRNA from the Chimeric.out.junction file is posted here: filterCirc.awk

Cheers
Alex

Kipp A

unread,
Jul 14, 2014, 11:08:09 PM7/14/14
to rna-...@googlegroups.com
That did it!  I was on version 2.3.0, but updating to 2.3.1z13 did the trick.
Thanks for your help!!

Nick Schurch

unread,
Feb 13, 2015, 11:57:29 AM2/13/15
to rna-...@googlegroups.com
Did you succeed in finding circular RNAs with STAR in this data? I'd be interested to know what you found...

Kipp A

unread,
Feb 13, 2015, 3:48:08 PM2/13/15
to rna-...@googlegroups.com
Yes, I have found STAR to be pretty good for cRNA.  Discovering cRNA with 50bp single end data was impossible, but worked for 100bp SE. Once discovered, you can really amp up your detection by creating artificial chromosomes of known cRNA junctions, indexing that for STAR, and realigning.  I have some scripts to do the filtering of STAR's chimeric output I can PM to you if you're interested. 

António Miguel de Jesus Domingues

unread,
Apr 1, 2015, 10:49:16 AM4/1/15
to rna-...@googlegroups.com
Hi all,

also on the subject of circular RNAs, does any of you normalize the junctions counts? I am starting to look at my data now, and since I have multiple samples, it makes sense to normalize before comparing. One strategy that I am contemplating is that of Zhang et al (Cell 2014), in which circular RNA junction reads are normalized to (million of) mapped reads. But how should  the mapped reads for STAR be calculated? Would something like
 
Number of input reads * (% of reads unmapped: too many mismatches + % of reads unmapped: too short + % of reads unmapped: other

also include the reads in .Chimeric.out.junction which are ultimately the ones that are used for circular RNA detection? Is there another way of getting the number of mapped reads?

Any suggestions are welcome!

António

Alexander Dobin

unread,
Apr 3, 2015, 11:34:14 AM4/3/15
to rna-...@googlegroups.com
Hi António,

for the "basic" number of mapped reads, I would simply use Log.final.out entries:
Uniquely mapped reads number + Number of reads mapped to multiple loci + Number of reads mapped to too many loci
This number will include some of the chimeric/circular reads - those which can be mapped with short enough soft-clipping.
The number of chimeric reads should be very small compared to the total number of reads, so, in principle, it will not affect the normalization factor significantly.
However, if you want to be rigorous, to avoid double counting you would need to extract read names from Aligned.out.sam and Chimeric.out.sam, and count the unique names, e.g.:
cut -f1 Aligned.out.sam Chimeric.out.sam | grep -v "^@" | sort | uniq | wc

Cheers
Alex

behin

unread,
Dec 14, 2015, 12:58:59 PM12/14/15
to rna-star
Dear Kipp A,
thanks for nice question and answers. I have started to work on circular RNAs. you said you have some scripts to filter STAR chimeric output. is it possible to send me those scripts?
thanks in advance,
behin

Kipp A

unread,
Dec 14, 2015, 1:14:59 PM12/14/15
to rna-star
Here's a link to my github for this project.  I haven't cleaned it up for massive consumption so apologies in advance for anything unclear.  Please post any bugs/issues/ideas you find to the github, it would be a great help. 
https://github.com/kippakers/starchimp 

Paolo Kunderfranco

unread,
May 20, 2016, 11:42:27 AM5/20/16
to rna-star
Hi Kipp
I tried to use you perl script with STAR output
I don't know why is not working, I modified the Parameters.txt  file like this:

##Parameters for starchimp-circles
readsCutoff = 5
minSubjectLimit = 10
cpus = 10
do_splice = true 
cpmCutoff = 0
subjectCPMcutoff = 0 
annotate = true
refbed = /home/pkunder/data/Metadata/Metadata_GSE69637_SMC/hg19_UCSC_gtf.bed
starprefix = Control_1
IDstepsback = 1 ## this is the position from the right of your path of the name of your files.  
##for example: /path/to/sample1/star/2.4.2/output/Chimeric.out.junction 
##sample1 is 4 steps back.
##or /path/to/star/2.4.2/sample1/Chimeric.out.junction
#sample1 is 1 step back.  

and I have got the following output:

/home/pkunder/bin/starchimp-master/scripts/circles/circle_star.sh 5 10 STARdirs.txt true 10 0 0 true /home/pkunder/data/Metadata/Metadata_GSE69637_SMC/hg19_UCSC_gtf.bed Control_1 1
Circular RNA species must have at least 5 reads in at least 10 subjects/output files.  Using 10 CPUs.
For CPMs: Rscript must be callable.  Must be 0 subjects/outputs with 0 Counts per million circular reads to count a given circular RNA
/home/pkunder/bin/starchimp-master/scripts/circles/circle_star.sh: 35: /home/pkunder/bin/starchimp-master/scripts/circles/circle_star.sh: Bad substitution
/home/pkunder/bin/starchimp-master/scripts/circles/circle_star.sh: 41: /home/pkunder/bin/starchimp-master/scripts/circles/circle_star.sh: Bad substitution


Any suggestion?

Kipp A

unread,
May 20, 2016, 12:29:43 PM5/20/16
to rna-star
Hi Paolo,

Thanks for the feedback! What kind of system are you running?  A Mac or linux?  If linux what distribution?

Thanks,
Kipp
Reply all
Reply to author
Forward
0 new messages