STAR --runThreadN 8 --runMode genomeGenerate --genomeDir pass1/Ref --genomeFastaFiles Reference/reference.fasta --sjdbGTFfile Reference/reference.gtf
After that, I filter the SJ out files in this way:
gawk '$6==1 || ($6==0 && ($7>2))' pass1/*SJ.out.tab > SJ.filt.tab
nohup STAR --runThreadN 8 --genomeDir genomeForPass2/ \
--sjdbGTFfile Reference/reference.gtf \
--outFileNamePrefix pass2/Col_3 \
--readFilesIn TrimmingHS/Col_3_R1.trim.adapt.fastq.gz TrimmigHS/Col_3_R2.trim.adapt.fastq.gz \
--alignEndsType EndToEnd --sjdbOverhang 114 --readFilesCommand "gunzip -c" --outSAMtype BAM SortedByCoordinate &
My paired ends reads have 115bp length (I have trimmed them to a unique length because rMATS require a specific read length), so the "--sjdbOverhang 114" is the good parameter ? (I don't understand well this parameter but I have read that the better value is read length - 1).
I have also read that it better to use " --alignEndsType EndToEnd ", I don't understand why but my mapping stats are good (about 97% uniquely mapped reads),
If you have any suggestion, thank's by advance to help me
$ STAR --runThreadN 8 --genomeDir pass1/Ref --sjdbFileChrStartEnd SJ.filt.tab \
--sjdbGTFfile Reference/reference.gtf \ do not need this option
--outFileNamePrefix pass2/Col_3 \
--readFilesIn TrimmingHS/Col_3_R1.trim.adapt.fastq.gz TrimmigHS/Col_3_R2.trim.adapt.fastq.gz \
--alignEndsType EndToEnd --sjdbOverhang 114 --readFilesCommand "gunzip -c" --outSAMtype BAM SortedByCoordinate