Chimeric.out.junction file column 7 mostly 0

149 views
Skip to first unread message

Jun Cheng

unread,
Aug 14, 2014, 12:45:55 PM8/14/14
to rna-...@googlegroups.com
Dear all,

I wanted to use star for circRNAs detection. I have single end 50bp RNAseq data.
After run STAR, I found the column 7 of "Chimeric.out.junction" mostly "0", which means "0 - any other motif".

To detect circRNAs I run the filterCirc.awk script with "Chimeric.out.junction" files. Among the ~15000 candidate circRNAs sites, ~11000 are denoted as "0" junction type.

Looking at the Log.final.out file, it looks like this:

                          Number of input reads |       56057901
                      Average input read length |       50
                                    UNIQUE READS:
                   Uniquely mapped reads number |       45233195
                        Uniquely mapped reads % |       80.69%
                          Average mapped length |       49.70
                       Number of splices: Total |       2034482
            Number of splices: Annotated (sjdb) |       1785112
                       Number of splices: GT/AG |       1987663
                       Number of splices: GC/AG |       16753
                       Number of splices: AT/AC |       2109
               Number of splices: Non-canonical |       27957

Looks like "GT/AG" is dominant. Could anyone tells me why I got mostly 0 in column 7 of  "Chimeric.out.junction" file?

I run STAR like this:
/software/STAR_2.3.1z15/STAR  --runThreadN 10   --genomeDir /data/Indices/STAR/ref   --genomeLoad LoadAndKeep   --readFilesIn /data/projects/Sample187_1.fastq.gz   --readFilesCommand zcat   --outFileNamePrefix Sample187 --outReadsUnmapped Fastx   --seedSearchStartLmax 30 --outFilterMultimapNmax 20   --outFilterScoreMin 1   --outFilterMatchNmin 1   --chimSegmentMin 15    --chimScoreMin 15   --chimScoreSeparation 10 --chimJunctionOverhangMin 15 --alignSJoverhangMin 8 --outFilterMismatchNmax 10    --outTmpDir /tmp;


Another question is:

I have a groups of data, each group has 4 biological replicates. Only among 2 replicates in only 1 group, I have a very low Uniquely mapped rate (<50%) compared with ~80% all the other replicates and groups. 

As suggested, I used the parameters --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0, and remapped the two low mapping rate sample, the mapping rates increased to 75%. 

The question is, whether it is necessary to remap all the other samples with the same parameter (adding  --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0)?

Thanks,
Jun

Alexander Dobin

unread,
Aug 15, 2014, 4:49:40 PM8/15/14
to rna-...@googlegroups.com
Hi Jun,

the normal splice junctions are heavily filtered against non-canonical junctions, why the chimeric (including circular) are not.
You would need to think about filtering the circular junctions (e.g. remove those from mitochondria, remove very short ones, etc)
I think it's fine that you see a large number of non-canonical circular junctions, most of them are probably not very abundant .

--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 is a very relaxed mapping requirement, it means that any mapped length is accepted. I would not recommend using that since you will be getting a lot of false positive alignments.
You need to figure out why the mapping rate is lower on some samples. Comparing mapping statistics in Log.final.out files can help. Most common causes are (i) poor sequencing quality, (ii) incomplete rRNA depletion, (iii) contamination

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages