Hi,I am trying to map some sequence data and running into a problem. We do want everything to be mapped allowing no more than 1 mis-match and multi-mappings. As an example, we have 1 sequence:
'GGCGTCTACGGCCATACCACCCTGAACGCGCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCGGGCCTGGTT'
By using BLAT, we know that this sequence is exactly in the genome (multiple times), but it doesn't show up in our mapping. My mapping commands are:
$STAR \
--runThreadN 60 \
--genomeDir $STARGENOMEDIR \
--readFilesIn $INPUTFILE \
--readFilesCommand zcat \
--outFileNamePrefix Mapped.out. \
--outSAMmultNmax -1 \
--outSAMtype BAM SortedByCoordinate \
--quantMode TranscriptomeSAM \
--outFilterMismatchNmax 1 \
-–sjdbGTFfile $GENEANNOTATION \
--outFilterMultimapNmax 1000 \
--outSAMattributes Standard \
--chimSegmentMin 20 \
--twopassMode Basic \
--twopass1readsN -1 \
--outTmpKeep none \
--scoreDelOpen 0 \
--scoreDelBase 0 \
--scoreInsOpen 0 \
--scoreInsBase 0 \
--outReadsUnmapped Fastx
These are single-end 76bps reads. I'm not sure if there is an issue with the specific commands, or the way that STAR handles highly repetitive regions of the genome. Thank you for any help or insight.
Best,
Eric