Hello Alex,
i am trying to align exome reads(100 bp long) to genome.
I have gapped hits, that include intron and two pieces of adjacent exons, but I would like to find their existing retroelements as well (at the same run).
In order to accomplish this i use the following parameters:
--outSAMattributes All
--outSJfilterCountUniqueMin 10 2 2 2
--outFilterMultimapNmax 50
--outFilterMultimapScoreRange 15
--outFilterMismatchNoverLmax 0.15
But unfortunately, In many cases i get only the gapped hit and no related retroelemts.
Here is an example from my STAR run(it is the only hit i get):
HWI-EAS90_102619232:3:9:9833:10306#0 163 chr5 115230 801 255 55M7726N45M = 115238625 18293 AGACAAATGTTTTGAAAATGTCTGTGAGCTGGATTTGA TTTTCCATGTAGACAAGGTTCACAATATTCTTGCAGAAATGGTGATGGGGGGAATGGTATTG GGGGGFGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGG GGGGGGEFGFGGGGGFGGFGGGGGGGGGGGFGGG?GDEEFFFCE:EEBDEDDCCEEEACBCE NH:i:1 HI:i:1 AS:i:194 nM:i:1 jM:B:c,1 jI:B:i,115230856,115238581
If I check this read on UCSC site i get couple of more hits, which must fit into my STAR parameters, but still I don’t get them.
BLAT Search Results
SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
-------------------------------------------------------------------
99 1 100 100 100.0% 5 + 115230801 115238626 7826
92 1 100 100 94.0% 1 - 214656342 214656440 99
90 1 100 100 93.0% 12 - 12605234 12605332 99
If I check in the UCSC browser - the two last ones are retroelements of the gene of the first hit.
The thing is, i have many examples of this scenario in my results.
Could you tell me please what am I missing here.
The only thing I could think about is that STAR in its multimapping run doesn't mix + and – directions? (the retroelements are on the opposite strand) If this is the case, how could I overcome this?
Thank you in advance,
Stas