Hello,
I am trying to use STAR 2pass mode to map multiple samples from pair-ended RNAseq (i.e. RNA from organ1, organ2, organ3, etc) to one reference genome. I cleaned my sequences with Trimmomatic, and as a consequence I have 2 files with pair-ended reads, but also one with single-ended reads (I merged all individual survivors from Trimmomatic into one file).
My goal is to map all reads, both pair-ended and single-ended, from all organs, to the same reference genome. I want to use STAR 2pass mode, making sure that the 2 pass for every read uses the junctions detected from the 1 pass of all reads.
I searched extensively in previous topics. Other users have asked similar questions before, but they were using old versions of STAR, where the 2pass mode was implemented in a different manner. I gathered useful information from those topics, but I didn’t find an exact answer. Hence I open this new topic.
This is an outline of what I am thinking of doing:
1) Run Star 2pass mode for all pair-ended files from all organs (i.e. --readFilesIn organ1F, organ1R, organ2F, organ2R, organ3F, organ3R, etc), using my preferred –outSJfilter options, --twopassMode Basic --sjdbInsertSave All
I hope that this will map all pair-ended reads using filtered junctions from all organs; but most importantly, will save the reference genome with all these junctions.
2) Run Star 2pass mode with the same options as before, but for all single-ended reads from all organs (i.e. –readFilesIn organ1SE, organ2SE, organ3SE, etc).
I expect that this will map all single-ended reads using filtered junctions from both the pair-ended (step 1) and single-ended (this step) mapped reads, and add the newly detected junctions to the reference genome.
3) Run Star 1pass mode for all pair-ended files from all organs, as in step 1.
What I want here is to map all pair-ended reads using the junctions from both pair-ended and single-ended mapped reads. Also, I want to keep the reference genome with all the junctions, which I could use later, for example to analyze each organ individually.
Lastly, I want to create a single sortedBam file as the final output. I want this file to have both pair-ended and single-ended reads. Naturally, I want all the original information for each read, such as pair or single- ended, mapped or unmapped, etc. My idea is to use this unified Bam file in Trinity.
Maybe naively, I am thinking of simply merging the sortedBams generated from steps 2 and 3:
4) cat SortedBamPE SortedBamSE > SortedBamAllReads
Is this how I should proceed? Please comment/correct/suggest.
I am using STAR 2.4.1d, but I plan to move to the newly released 2.4.2a in a few days.
Many thanks,
Mau
>HS31_21758:1:1212:12019:41489/1
GGGAGCTCCCTGGACTGAAGGAGACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGC
>HS31_21758:1:1108:9330:41982/1
GAAGGAGACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGT
>HS31_21758:1:1316:11018:39202/1
ACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGATG
>HS31_21758:1:1302:5188:24725/1
GCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGATGCA
>HS31_21758:1:1206:5924:68173/1
GCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGATGCACT