Hi,
I'm using STAR to map SMART-Seq2 reads on hg19 and within the results, I have something that I didn't expect. I have always a lot of Chimeric alignments and very little Junctions. For example, for 1 result, here's the data sizes :
Aligned.sortedByCoord.out.bam : 62.5 Mo
Chimeric.out.junction : 49.9 Mo
SJ.out.tab : 19 ko
Is it something usual ? I have reads with different length (from 60 to 130 bp with the majority between 120 and 130bp). Is it related ?
Also I was wondering how STAR by default is treating SMART-Seq2 reads with the fact that they are unstranded ? Does it affect the mapping results ?
Thank you for your time.
Pierre-Emmanuel Bonté
Command I use (read_length = max read length found in fastq):
STAR \
--sjdbOverhang $(($read_length -1)) \
--quantMode GeneCounts \
--twopassMode Basic \
--runThreadN 8 \
--genomeDir $genome_dir \
--sjdbGTFfile $gtf_file \
--alignSJDBoverhangMin 1 \
--readFilesIn $fastq \
--outFileNamePrefix "$out_dir/" \
--outSAMtype BAM SortedByCoordinate \
--outTmpDir $TMPDIR \
--outSAMunmapped Within \
--bamRemoveDuplicatesType UniqueIdentical \
--outMultimapperOrder Random \
--outFilterMismatchNoverLmax 0.04 \
--outFilterMatchNminOverLread 0.33 \
--outFilterScoreMinOverLread 0.33 \
--outFilterMultimapNmax 1000 \
--winAnchorMultimapNmax 1000 \
--chimOutType WithinBAM \
--chimSegmentMin 10 \
--chimJunctionOverhangMin 10;