Dear Alex,
Thanks for the wonderful software.
I am running STAR using the internal 2pass method to discover splice junctions with "
--twopass1readsN -1"
I am using "
--outFilterType BySJout"
There is no sjdb in the target genome.
Per a previous post on this forum from you regarding the "
--outSJfilter" parameters:
These four integers actually work for all 7 motif types - I need to clarify it in my documentation.
1st int is for non-canonical (i.e. none of the motifs below)
2nd
number is for GT/AG and CT/AC. Note that CT/AC is reverse complementary
to GT/AG, which means that the junction is transcribed from the (-) DNA
strand - that is actually how strand (col. 4) is determined.
3rd:
GC/AG and CT/GC
4th:
AT/AC and GT/AT Therefore, if I set "--outSJfilter*" to favor GT/AG junctions, then CT/AC junctions are also favored.
This is undesirable because CT/AC junctions are actually non-canonical. etc.
To avoid this, it would be necessary for STAR to be informed, for strand-specific libraries,
about the relative strand of read1 and read2 (for paired-End reads). There are 4 possibilities
(2 each for read1, read2). (Plus the unstranded possibility).
As you are probably aware, tophat has a parameter "
--library-type", which it uses
for just this purpose (filtering splice junctions).
Have you considered adding such parameters? Or have I missed something in the existing parameters?
I am aware that I could post-process the bam files, removing the reads that have the reverse complement
splices, but this is much less desirable than having a "clean" bam output by STAR.
Thanks again for the software.
/Sol Katzman