I went back to my command history and confirmed how the “unique-only” BAM files were generated. I filtered the original STAR BAM files by retaining only alignments with the NH:i:1 tag (reads with a single reported alignment). The command was essentially:
samtools view -h input.bam |
awk ‘BEGIN{OFS=”\t”} /^@/ {print; next} $0 ~ /NH:i:1/ {print}’ |
samtools view -b -o output.unique.bam
The filtered BAMs were then sorted and indexed.
From these BAMs I generated the splice junction files using:
majiq-v3 sj BAM annotation.sg.zarr sample.sj
Then I built a single cohort splicegraph from all SJ files:
majiq-v3 build annotation.sg.zarr sg.zarr –sj *.sj
After that, I ran psi-coverage for each sample using the same splicegraph:
majiq-v3 psi-coverage sg.zarr sample.psicov sample.sj
Finally, I generated the TSV outputs with majiq-v3 quantify.
So the difference between the two analyses is only in the BAM filtering step (original BAMs vs BAMs filtered with NH:i:1), while the rest of the MAJIQ workflow was the same.
From your explanation, I understand that this filtering is stricter than MAJIQ’s internal handling, since MAJIQ excludes unmapped and secondary alignments but may still retain the primary alignment of reads that have multiple reported mappings.
This may explain the differences observed between the analyses.