Dear Wan-Ping and Mosaik users.
I am working on a quite repeated plant genome with 150bp paired-end Illumina reads. I used Mosaik to align them to the genome in paired-end mode and with proper median fragment length and search radius inserted. Could you please help me interpret the results for paired-end correctly? My main goal is to identify reads that align to a single location after paired-end resolution.
Part 1: interpreting paired-end output
######### Here is an output example with a few points highlighted:
Alignment statistics (mates):
================================================
# too many N's: 14 ( 0.0 %)
# failed hash: 150423 ( 8.2 %)
------------------------------------------------
# unaligned mates(X): 150437 ( 8.2 %)
# filtered out(F): 120653 ( 6.6 %)
# uniquely aligned mates(U): 766148 ( 41.7 %)
# multiply aligned mates(M): 799778 ( 43.5 %)
================================================
total aligned: 1565926 ( 85.2 %)
total: 1837016
Alignment statistics (pairs):
===================================================================
Local rescues Frag. consistency
Completely aligned pairs
-------------------------------------------------------------------
# U-U pairs: 281424 ( 30.6 %) 9577 270786
# U-M pairs: 130500 ( 14.2 %) 19291 108729
# M-M pairs: 306285 ( 33.3 %) 8461 295511
SUM: 718209
Partially aligned pairs
-------------------------------------------------------------------
# U-F pairs: 49941 ( 5.4 %)
# U-X pairs: 22859 ( 2.5 %)
# M-F pairs: 30456 ( 3.3 %)
# M-X pairs: 26252 ( 2.9 %)
Both ends unaliged pairs
-------------------------------------------------------------------
# F-F pairs: 9147 ( 1.0 %)
# F-X pairs: 21961 ( 2.4 %)
# X-X pairs: 39682 ( 4.3 %)
===================================================================
total aligned: 847717 ( 92.3 %) 37329 675026
total: 918508
MosaikAligner CPU time: 9196.610 s, wall time: 3315.385 s
######### End of example
As you can see, I have quite a lot of M-M pairs even with 2x150bp. My specific questions are:
- How do I know which pairs were resolved during the paired-end resolution? For example, out of the 306285 multiple aligned pairs, how many resulted in at most one combination of alignments that fit the confidence interval search?
- What does Local rescues and Frag. consistency means?
Part 2: linking results with bamtools filter
The following results were obtained for bamtools filter for the sample above
TOT=$(bamtools count -in $FILE)
MAPPED=$(bamtools filter -in $FILE -isMapped true | bamtools count)
MATE_PAIRED=$(bamtools filter -in $FILE -isMapped true -isMateMapped true | bamtools count)
MATE_PROPER_PAIRED=$(bamtools filter -in $FILE -isMapped true -isMateMapped true -isProperPair true | bamtools count)
|
TOT |
MAPPED |
MATE_PAIRED |
MATE_PROPER_PAIRED |
| Mates |
1837016 |
1565926 |
1436418 |
1424670 |
| Pairs |
918508 |
782963 |
718209 |
712335 |
Note that a lot of the numbers match with the output above, as obviously expected.
My specific questions are:
- What is a isProperPair and how does it relate to MosaikOutput
- Is there a way that I can use bamtools filter to select pairs that were resolved? (resulted in at most one combination of alignments that fit the confidence interval search)
Thank you so much for your help on this.
Regards
Leandro