Chimeric.out.junction file column 7 mostly 0

149 views

Skip to first unread message

Jun Cheng

unread,

Aug 14, 2014, 12:45:55 PM8/14/14

to rna-...@googlegroups.com

Dear all,

I wanted to use star for circRNAs detection. I have single end 50bp RNAseq data.

After run STAR, I found the column 7 of "Chimeric.out.junction" mostly "0", which means "0 - any other motif".

To detect circRNAs I run the filterCirc.awk script with "Chimeric.out.junction" files. Among the ~15000 candidate circRNAs sites, ~11000 are denoted as "0" junction type.

Looking at the Log.final.out file, it looks like this:

Number of input reads | 56057901

Average input read length | 50

UNIQUE READS:

Uniquely mapped reads number | 45233195

Uniquely mapped reads % | 80.69%

Average mapped length | 49.70

Number of splices: Total | 2034482

Number of splices: Annotated (sjdb) | 1785112

Number of splices: GT/AG | 1987663

Number of splices: GC/AG | 16753

Number of splices: AT/AC | 2109

Number of splices: Non-canonical | 27957

Looks like "GT/AG" is dominant. Could anyone tells me why I got mostly 0 in column 7 of "Chimeric.out.junction" file?

I run STAR like this:

/software/STAR_2.3.1z15/STAR --runThreadN 10 --genomeDir /data/Indices/STAR/ref --genomeLoad LoadAndKeep --readFilesIn /data/projects/Sample187_1.fastq.gz --readFilesCommand zcat --outFileNamePrefix Sample187 --outReadsUnmapped Fastx --seedSearchStartLmax 30 --outFilterMultimapNmax 20 --outFilterScoreMin 1 --outFilterMatchNmin 1 --chimSegmentMin 15 --chimScoreMin 15 --chimScoreSeparation 10 --chimJunctionOverhangMin 15 --alignSJoverhangMin 8 --outFilterMismatchNmax 10 --outTmpDir /tmp;

Another question is:

I have a groups of data, each group has 4 biological replicates. Only among 2 replicates in only 1 group, I have a very low Uniquely mapped rate (<50%) compared with ~80% all the other replicates and groups.

As suggested, I used the parameters --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0, and remapped the two low mapping rate sample, the mapping rates increased to 75%.

The question is, whether it is necessary to remap all the other samples with the same parameter (adding --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0)?

Thanks,

Jun

Alexander Dobin

unread,

Aug 15, 2014, 4:49:40 PM8/15/14

to rna-...@googlegroups.com

Hi Jun,

the normal splice junctions are heavily filtered against non-canonical junctions, why the chimeric (including circular) are not.

You would need to think about filtering the circular junctions (e.g. remove those from mitochondria, remove very short ones, etc)

I think it's fine that you see a large number of non-canonical circular junctions, most of them are probably not very abundant .

--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 is a very relaxed mapping requirement, it means that any mapped length is accepted. I would not recommend using that since you will be getting a lot of false positive alignments.

You need to figure out why the mapping rate is lower on some samples. Comparing mapping statistics in Log.final.out files can help. Most common causes are (i) poor sequencing quality, (ii) incomplete rRNA depletion, (iii) contamination

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages