multiple hits in the program

412 views
Skip to first unread message

Varun Gupta

unread,
Nov 18, 2013, 4:42:51 PM11/18/13
to rna-...@googlegroups.com
HI Alex,
If in STAR mapping, multiple hits are used like 10 which is the default, then a single read has a possibility of mapping up to 10 different locations. Let's say a read gets mapped at 5 different locations. Does it mean that this read at all the 5 different locations would have the same alignment score???
I will look into the bam files you asked me to send to you, but can this be the explanation for same number of counts even if the hits were increased from 20 to 30 and then to 40

Regards
Varun

Alexander Dobin

unread,
Nov 20, 2013, 11:54:10 PM11/20/13
to rna-...@googlegroups.com
Hi Varun,

if a read map to 5 loci with the scores that are within --outFilterMultimapScoreRange (=1 by default) of the best score, all 5 will be reported. The alignment scores can be different by no more than 1.
If a read maps to more than --outFilterMultimapNmax (=10 by default) loci within this score range, this read will not be recorded to Aligned.out.sam. The number of these reads is reported in the Log.final.out file as 
        Number of reads mapped to too many loci |       15403
             % of reads mapped to too many loci |       0.04%
Typically the number of these reads is very small, and you would not get many more reads mapped as you increase the --outFilterMultimapNmax,

Cheers
Alex

Stas

unread,
Jan 13, 2014, 8:19:30 AM1/13/14
to rna-...@googlegroups.com

Hello Alex,


i am trying to align exome reads(100 bp long) to genome.

I have gapped hits, that include intron and two pieces of adjacent exons, but I would like to find their existing retroelements as well (at the same run). 

In order to accomplish this i use the following parameters:

--outSAMattributes All

  --outSJfilterCountUniqueMin 10   2   2   2 

  --outFilterMultimapNmax 50 

  --outFilterMultimapScoreRange 15

  --outFilterMismatchNoverLmax 0.15


 

But unfortunately, In many cases i get only the gapped hit and no related retroelemts.

Here is an example from my STAR run(it is the only hit i get):

HWI-EAS90_102619232:3:9:9833:10306#0   163     chr5    115230                          801     255     55M7726N45M     =       115238625       18293   AGACAAATGTTTTGAAAATGTCTGTGAGCTGGATTTGA                          TTTTCCATGTAGACAAGGTTCACAATATTCTTGCAGAAATGGTGATGGGGGGAATGGTATTG  GGGGGFGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGG                          GGGGGGEFGFGGGGGFGGFGGGGGGGGGGGFGGG?GDEEFFFCE:EEBDEDDCCEEEACBCE  NH:i:1  HI:i:1  AS:i:194        nM:i:1                          jM:B:c,1        jI:B:i,115230856,115238581

If I check this read on UCSC site i get couple of more hits, which must fit into my STAR parameters, but still I don’t get them.

BLAT Search Results

SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN

-------------------------------------------------------------------

99     1   100   100 100.0%     5   +  115230801 115238626   7826

92     1   100   100  94.0%     1   -  214656342 214656440     99

90     1   100   100  93.0%    12   -   12605234  12605332     99

 

If I check in the UCSC browser - the two last ones are retroelements of the gene of the first hit.

The thing is, i have many examples of this scenario in my results.

Could you tell me please what am I missing here.

The only thing I could think about is that STAR in its multimapping run doesn't mix + and  – directions? (the retroelements are on the opposite strand) If this is the case, how could I overcome this?


Thank you in advance,

Stas

Alexander Dobin

unread,
Jan 15, 2014, 6:38:39 PM1/15/14
to rna-...@googlegroups.com
Hi Stas,

the second alignment from BLAT contains 3 mismatches and one insertion, you would need to increase STAR sensitivity with --seedSearchStartLmax 15:

1       0       chr5    115230801       3       55M7726N45M     *       0       0       AGACAAATGTTTTGAAAATGTCTGTGAGCTGGATTTGATTTTCCATGTAGACAAGGTTCACAATATTCTTGCAGAAATGGTGATGGGGGGAATGGTATTG    *       NH:i:2  HI:i:1  AS:i:99 nM:i:0
1       272     chr1    214656342       3       18M1I81M        *       0       0       CAATACCATTCCCCCCATCACCATTTCTGCAAGAATATTGTGAACCTTGTCTACATGGAAAATCAAATCCAGCTCACAGACATTTTCAAAACATTTGTCT    *       NH:i:2  HI:i:2  AS:i:87 nM:i:3

The second alignment now agrees with BLAT. Note that it's score is 87 which is 12 less than the best score of 99, getting close to your limit of --outFilterMultimapScoreRange 15 -  so I would probably increase it to ~30, though it will generate a lot of sub-optimal alignments. Note that STAR is not designed for effective search of suboptimal alignments. If a short deletion or insertion is very close to the end of a read, it will be soft-clipped, thus yielding an even lower score.

Cheers
Alex

Stas

unread,
Feb 18, 2014, 7:43:06 AM2/18/14
to
Hi Alex,

thank you for your answer!

i have another little problem:
i run STAR alignment, and stopped it by "control C" command.
after i did it, it fails to run again(after 2 seconds it reports - alignment finished).
Here is what i get at 'nohup.out' file:

Feb 16 15:53:11 ..... Started STAR run
Feb 16 15:53:11 ..... Started mapping
Feb 16 15:53:12 ..... Finished successfully
rm: cannot remove `outSamPrefix_tmp//.nfs0000000102fcb9a10000020a': Device or resource busy
rm: cannot remove `outSamPrefix_tmp//.nfs0000000102fcb99e00000205': Device or resource busy
rm: cannot remove `outSamPrefix_tmp//.nfs0000000102fcb9a000000206': Device or resource busy

any idea?
thanks in advance,
Stas

 

Alexander Dobin

unread,
Feb 18, 2014, 11:47:08 AM2/18/14
to rna-...@googlegroups.com
Hi Stas,

this is not STAR problem, but rather Linux/NFS idiosyncrasy. These temporary files were created by NFS when STAR run was aborted, and you (or STAR) may not be able to delete them for a while. The best thing would be to run STAR with a different output path. After a few minutes you should be able to delete the files and can use the old directory again.

Cheers
Alex

Stas

unread,
Feb 19, 2014, 6:47:18 AM2/19/14
to rna-...@googlegroups.com
Thank you
Stas
Reply all
Reply to author
Forward
0 new messages