Dear all,
I am relatively new to the field and I am trying to use STAR to do alignment and count in Arabidopsis Thaliana. The variety of options found in the literature has been overwhelming.
A list of options I am considering to use is below. I wonder if these options seem reasonable? In particular, would there be a reason to change below three options for Arabidopsis from their default?
- alignSJDBoverhangMin
- outFilterMismatchNmax
- alignMatesGapMax
Many thanks for your help and reply beforehand,
Star [create genomic indices]
# STAR \
--runMode genomeGenerate \
--genomeDir ...\
--genomeFastaFiles ./Arabidopsis_thaliana.TAIR10.dna.toplevel.fa \
--sjdbGTFfile ./Arabidopsis_thaliana.TAIR10.50.gtf \
--sjdbOverhang 74 \
\## My ReadLength-1
--genomeSAindexNbases 12
\## log2(GenomeLength)/2 - 1=12.41
# Star [alignment]
STAR \
--quantMode GeneCounts \
--genomeDir ./ATGenoIndices \
--readFilesIn .... \
--outFileNamePrefix ... \
--outFilterMultimapNmax 20 \
# Default
--alignSJoverhangMin 8 \
# Default
--alignSJDBoverhangMin 3 \
# Default (alternative 8?)
--outFilterMismatchNmax 10 \
# Default (alternative 8?)
--alignIntronMin 35 \
--alignIntronMax 2000 \
\## 99.3% of introns in AT are below this size based on doi:10.3390/genes8080200
--alignMatesGapMax 0 \
# Default (alternative 100,000?)
--readFilesCommand zcat