Hi Alex,
it is nor running according to what I have expected.
These are the parameters I am using:
—outFilterMultimapNmax 10
—outFilterMismatchNoverLmax 0.4
—alignIntronMax 100000
—alignSJoverhangMin 8
—alignSJDBoverhangMin 5
—outFilterType BySJout
—outSJfilterCountTotalMin 8 5 5 5
—outSJfilterCountUniqueMin -1 -1 -1 -1
—outSAMtype BAM SortedByCoordinate
—outWigType wiggle read1_5p
—outReadsUnmapped Fastx
—outSAMstrandField intronMotif
—outSAMattributes All
—quantMode TranscriptomeSAM GeneCounts(The important one for this question are in bold)
I want only those SJ in my SJ.out.tab file, which have at least 5 reads ( or 8 for the non-canonical motifs).
I am running the command and it looks quite good, but not good enough. I still get this kind of rows:
awk 'FNR==5442 {print}' STAR.SJ.out.tab
2L 11438179 11487284 1 1 0 1 0 22
This row is unannotated (column 6) and have "only" one unique and no multi-mapped reads overlapping this region. I don't want these reads to be in my SJ.out.tab file
I know I can use awk to filter all the junctions with less than 5 reads, but
How can I discard them upfront?
thanks
Assa