Hi San,
thank you for your prompt reply.
Let me clarify the question we are trying to answer.
Should we restrict the analysis to reads with unique alignement or showld we also include multimap reads for Majiq quantification?
Right now, we are using MAJIQ v3 on BAM produced by STAR in two-pass mode, without using the flag --outFilterIntronMotifs RemoveNoncanonical to increase the sensitivity of the analysis.
We then ran the MAJIQ pipeline in two configurations:
1.
Using the full BAM file, thus including multimapping reads.
2.
Using a BAM where multimapping reads were removed (NH:i:1 only)
We compared the number of de novo events with coverage > 0 detected by MAJIQ.
Our results are the following:
FULL BAM (multimapping included)
Single sample: 53,966
Cohort build: 141,575
UNIQUE BAM (NH:i:1 only)
Single sample: 27,197
Cohort build: 82,376
This corresponds roughly to:
Single: −49% events when removing multimapping reads
Cohort: −42% events
When comparing the sets of detected events:
Total de novo events (coverage > 0):
Multimap: 141,575 events
Unique: 82,376 events
Overlap: 73,480 events
Events only detected with multimapping reads: 68,095
Events only detected with unique reads: 8,896
This result is puzzling at multiple levels. however oll the questions leads to the same point.
how does Majiq employ reads with multiple alignements? and as a consequence, what do you suggest to do, in order to have reliable LSV?
Thank you very much for your help.
Best regards
Majkena