Using STARlong for Oxford Nanopore DRS reads....

Nick Schurch

unread,

Aug 3, 2017, 9:29:04 AM8/3/17

to rna-star

Has anyone tried and had success with this?

I've tried it with start2.5 and the recommended setting for PacBio Isoseq data, but with very marginal success. These are the options I use:

--runMode alignReads
--readNameSeparator space
--outFilterMultimapScoreRange 1
--outFilterMismatchNmax 2000
--scoreGapNoncan -20
--scoreGapGCAG -4
--scoreGapATAC -8
--scoreDelOpen -1
--scoreDelBase -1
--scoreInsOpen -1
--scoreInsBase -1
--alignEndsType Local
--seedSearchStartLmax 50
--seedPerReadNmax 100000
--seedPerWindowNmax 1000
--alignTranscriptsPerReadNmax 10000
 --alignTranscriptsPerWindowNmax 10000
--genomeDir STAR_2.5_sjdbOverhang75

The result is

> more Log.final.out
                                 Started job on | Aug 01 17:22:47
                             Started mapping on | Aug 01 17:23:32
                                    Finished on | Aug 01 21:02:18
       Mapping speed, Million of reads per hour | 0.03


                          Number of input reads | 116617
                      Average input read length | 900
                                    UNIQUE READS:
                   Uniquely mapped reads number | 532
                        Uniquely mapped reads % | 0.46%
                          Average mapped length | 19.71
                       Number of splices: Total | 194
            Number of splices: Annotated (sjdb) | 34
                       Number of splices: GT/AG | 193
                       Number of splices: GC/AG | 1
                       Number of splices: AT/AC | 0
               Number of splices: Non-canonical | 0
                      Mismatch rate per base, % | 3.73%
                         Deletion rate per base | 0.40%
                        Deletion average length | 1.91
                        Insertion rate per base | 0.10%
                       Insertion average length | 2.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci | 1226
             % of reads mapped to multiple loci | 1.05%
        Number of reads mapped to too many loci | 71
             % of reads mapped to too many loci | 0.06%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches | 0.00%
                 % of reads unmapped: too short | 93.84%
                     % of reads unmapped: other | 4.59%
                                  CHIMERIC READS:
                       Number of chimeric reads | 0
                            % of chimeric reads | 0.00%

Clearly it is trying to map the reads but it seems it can't find good seeds to begin with. I'm trying again with:

--seedSearchStartLmax 20

but also wondering if my genome index isn't helping. I'm using an index build for 75bp Illumina reads...

Thoughts?

Alexander Dobin

unread,

Aug 4, 2017, 12:43:42 PM8/4/17

to rna-star

Hi Nick,

STAR will likely not work well for reads with a large rate of indels.

The parameters you may want to tweak aggressively are

--seedSearchStartLmax - reduce to 10 or even below

--winAnchorMultimapNmax - increase to 1000 or more

--seedMultimapNmax - increase to 100000 or more

If you have a few reads that you can share, I can play with parameters to see what is the best the current algorithm can do.

Cheers

Alex

Nick Schurch

unread,

Aug 8, 2017, 5:04:01 AM8/8/17

to rna-star

Thanks Alex, I'll give that a try. I'll check with my PI and make sure he's happy before I do splashing the reads around ;)

Nick Schurch

unread,

Aug 9, 2017, 10:57:53 AM8/9/17

to rna-star

I tried these parameters but it resulted in a Fatal Error during the alignment:

EXITING because of FATAL error: too many pieces pere read
SOLUTION: increase input parameter --seedPerReadNmax
Aug 08 18:09:03 ...... FATAL ERROR, exiting

Seems like it really doesn't like --seedSearchStartLmax 10. I'll try it with 20 and see if that goes through...

.

Alexander Dobin

unread,

Aug 9, 2017, 12:31:36 PM8/9/17

to rna-star

Hi Nick,

it seems you would have to further increase --seedPerReadNmax until this error disappears.