Interpreting Paired-end output and linking it with bamtools filter

61 views
Skip to first unread message

Leandro Neves

unread,
Nov 21, 2014, 4:59:34 PM11/21/14
to mosaik-...@googlegroups.com
Dear Wan-Ping and Mosaik users.

I am working on a quite repeated plant genome with 150bp paired-end Illumina reads. I used Mosaik to align them to the genome in paired-end mode and with proper median fragment length and search radius inserted. Could you please help me interpret the results for paired-end correctly? My main goal is to identify reads that align to a single location after paired-end resolution.

Part 1: interpreting paired-end output
######### Here is an output example with a few points highlighted:
Alignment statistics (mates):
================================================
   # too many N's:                  14 (  0.0 %)
   # failed hash:               150423 (  8.2 %)
------------------------------------------------
# unaligned mates(X):           150437 (  8.2 %)
# filtered out(F):              120653 (  6.6 %)
# uniquely aligned mates(U):    766148 ( 41.7 %)
# multiply aligned mates(M):    799778 ( 43.5 %)
================================================
total aligned:                 1565926 ( 85.2 %)
total:                         1837016

Alignment statistics (pairs):
===================================================================
                                  Local rescues   Frag. consistency
Completely aligned pairs
-------------------------------------------------------------------
# U-U pairs:     281424 ( 30.6 %)          9577              270786
# U-M pairs:     130500 ( 14.2 %)         19291              108729
# M-M pairs:     306285 ( 33.3 %)          8461              295511
                SUM: 718209
Partially aligned pairs
-------------------------------------------------------------------
# U-F pairs:      49941 (  5.4 %)
# U-X pairs:      22859 (  2.5 %)
# M-F pairs:      30456 (  3.3 %)
# M-X pairs:      26252 (  2.9 %)

Both ends unaliged pairs
-------------------------------------------------------------------
# F-F pairs:       9147 (  1.0 %)
# F-X pairs:      21961 (  2.4 %)
# X-X pairs:      39682 (  4.3 %)
===================================================================
total aligned:   847717 ( 92.3 %)         37329              675026
total:           918508

MosaikAligner CPU time: 9196.610 s, wall time: 3315.385 s
######### End of example

As you can see, I have quite a lot of M-M pairs even with 2x150bp. My specific questions are:

- How do I know which pairs were resolved during the paired-end resolution? For example, out of the 306285 multiple aligned pairs, how many resulted in at most one combination of alignments that fit the confidence interval search?

- What does Local rescues and Frag. consistency means?

Part 2: linking results with bamtools filter
The following results were obtained for bamtools filter for the sample above
TOT=$(bamtools count -in $FILE)
MAPPED=$(bamtools filter -in $FILE -isMapped true | bamtools count)
MATE_PAIRED=$(bamtools filter -in $FILE -isMapped true -isMateMapped true | bamtools count)
MATE_PROPER_PAIRED=$(bamtools filter -in $FILE -isMapped true -isMateMapped true -isProperPair true | bamtools count)

TOT MAPPED MATE_PAIRED MATE_PROPER_PAIRED
Mates 1837016 1565926 1436418 1424670
Pairs 918508 782963 718209 712335

Note that a lot of the numbers match with the output above, as obviously expected. 

My specific questions are:
- What is a isProperPair and how does it relate to MosaikOutput

- Is there a way that I can use bamtools filter to select pairs that were resolved?  (resulted in at most one combination of alignments that fit the confidence interval search)

Thank you so much for your help on this.

Regards
Leandro


Reply all
Reply to author
Forward
0 new messages