Hi Alex,
Thanks for your prompt reply. I tested concatenating all junctions, filter mitochondria and non-canonical using the
shell script you posted before and then let STAR to collapse identical junctions. I have also concatenated, filter and collapse (cat, grep, sort -u, awk) all SJ.out.tab files manually. Both produced almost identical no. of junctions (minor differences). One thing I saw was that, in both cases, when generating the genome index (using --sjdbFileChrStartEnd) no sjdbList.out.tab file was generated; a sjdbInfo.txt was generated instead. I could see that sjdbList.out.tab file was generated when using a gtf file and the GTF annotation parameters: --sjdbGTFfeatureExon exon sjadbGTFtagExonParentTranscript transcript_id.
Is it normal for STAR not to generate sjdbList.out.tab when using --sjdbFileChrStartEnd with a
Chr \tab\ Start \tab\ End \tab\ Strand(+or-)
file like the one generated using 2 pass mapping?