Issue with chimeric junctions mapping SMART-seq2 reads

Pierre-emmanuel Bonté

unread,

Mar 15, 2019, 12:14:22 PM3/15/19

to rna-star

Hi,

I'm using STAR to map SMART-Seq2 reads on hg19 and within the results, I have something that I didn't expect. I have always a lot of Chimeric alignments and very little Junctions. For example, for 1 result, here's the data sizes :

Aligned.sortedByCoord.out.bam : 62.5 Mo

Chimeric.out.junction : 49.9 Mo

SJ.out.tab : 19 ko

Is it something usual ? I have reads with different length (from 60 to 130 bp with the majority between 120 and 130bp). Is it related ?

Also I was wondering how STAR by default is treating SMART-Seq2 reads with the fact that they are unstranded ? Does it affect the mapping results ?

Thank you for your time.

Pierre-Emmanuel Bonté

Command I use (read_length = max read length found in fastq):

STAR \

--sjdbOverhang $(($read_length -1)) \

--quantMode GeneCounts \

--twopassMode Basic \

--runThreadN 8 \

--genomeDir $genome_dir \

--sjdbGTFfile $gtf_file \

--alignSJDBoverhangMin 1 \

--readFilesIn $fastq \

--outFileNamePrefix "$out_dir/" \

--outSAMtype BAM SortedByCoordinate \

--outTmpDir $TMPDIR \

--outSAMunmapped Within \

--bamRemoveDuplicatesType UniqueIdentical \

--outMultimapperOrder Random \

--outFilterMismatchNoverLmax 0.04 \

--outFilterMatchNminOverLread 0.33 \

--outFilterScoreMinOverLread 0.33 \

--outFilterMultimapNmax 1000 \

--winAnchorMultimapNmax 1000 \

--chimOutType WithinBAM \

--chimSegmentMin 10 \

--chimJunctionOverhangMin 10;

Alexander Dobin

unread,

Mar 15, 2019, 3:43:04 PM3/15/19

to rna-star

Hi Pierre-Emmanuel

SJ.out.tab contains collapsed (i.e. unique) junctions, while Chimeric.out.junction contains all reads overlapping chimeric junctions, which ma explain why the latter is bigger.

What is strange is the the small size of the Aligned bam.

Please send me the Log.final.out file.

Cheers

Alex

Pierre-emmanuel Bonté

unread,

Mar 18, 2019, 6:30:09 AM3/18/19

to rna-star

Hi Alexander,

The .fastq file wasn't that heavy (462 Mo) which may explain the small size BAM. I tried both with default parameters and parameters I sent you last time. It seems with default parameters I don't have many uniquely mapped reads as with modified parameters but either way, the proportions of chimeric reads seems very high and I don't know if it is expected.

Thanks again for your time.

Pierre-Emmanuel

Using STAR 2.5.3a

Default parameters :

--sjdbOverhang $(($read_length -1)) \

--runThreadN 8 \

--genomeDir $genome_dir \

--sjdbGTFfile $gtf_file \

--alignSJDBoverhangMin 3 \

--readFilesIn $fastq \

--outFileNamePrefix "$out_dir/" \

--outSAMtype BAM SortedByCoordinate \

--outTmpDir $TMPDIR \

--chimOutType WithinBAM \

--chimSegmentMin 10

Modified parameters :

--sjdbOverhang $(($read_length -1)) \

--quantMode GeneCounts \

--twopassMode Basic \

--runThreadN 8 \

--genomeDir $genome_dir \

--sjdbGTFfile $gtf_file \

--alignSJDBoverhangMin 1 \

--readFilesIn $fastq \

--outFileNamePrefix "$out_dir/" \

--outSAMtype BAM SortedByCoordinate \

--outTmpDir $TMPDIR \

--outSAMunmapped Within \

--bamRemoveDuplicatesType UniqueIdentical \

--outMultimapperOrder Random \

--outFilterMismatchNoverLmax 0.04 \

--outFilterMatchNminOverLread 0.33 \

--outFilterScoreMinOverLread 0.33 \

--outFilterMultimapNmax 1000 \

--winAnchorMultimapNmax 1000 \

--chimOutType WithinBAM \

--chimSegmentMin 10 \

--chimJunctionOverhangMin 10;

Default_parameters.Log.final.out

Modified_parameters.Log.final.out

Alexander Dobin

unread,

Mar 19, 2019, 4:46:43 PM3/19/19

to rna-star

Hi Pierre-emmanue

is the data paired- or single-end? In the former case, the large % of chimeric reads may be explained by the problem in ordering read1/read2.

Trimming sometimes causes this an needs to be set up to preserve the order of read1/2.

Cheers

Alex

Pierre-emmanuel Bonté

unread,

Mar 21, 2019, 9:38:58 AM3/21/19

to rna-star

Thank you so much Alexander for your thought !

It was indeed an issue due to the data. I'm working on published SRA dataset and the conversion from SRA to FASTQ gave me paired-end reads but merged together instead of two separate FASTQ files 1 and 2. Working on these files instead of the merged file solved the problem and now I have more junctions and very few chimeric alignments.