Vphaser2 error when running on bam file created by Stampy

106 views
Skip to first unread message

AA

unread,
Dec 10, 2014, 4:33:47 AM12/10/14
to viral-to...@googlegroups.com
Hi There,

I am trying to use Vphaser2. When I run it on a bam file created by Stampy it gives me the following error. (I tried it on sorted and unsorted bam files with the same result.)

--------------------------------------------------------
Program runs with the following Parameter setting:

        input BAM file  =       wtchgD00002501_closestRef_sorted.bam
        output Directory        =       test2
        errModel                =       pileup + phase
        alpha           =       0.05
        ignoreBases     =       0
        (var_matepair, var_cycle, var_dt, var_qt)       =       1,1,1,20
        pSample         =       30%
        windowSz        =       500
        delta   =       2

--------------------------------------------------------


        1 bam file(s) found:
                wtchgD00002501_closestRef_sorted.bam


Parse bam header: get refSeq info & sanity check

        3a_D17763 len =9629

        1 ref sequence(s) found:
                Name: 3a_D17763
                        BamfileID = 0   RefID = 0


        0 platform(s) found:


Get maxQ, minQ, maxReadLen, avgFragSz, stdFragSz from bam files ...

        Total Reads = 357762
        # Mapped Reads = 356221
        # Reads used for checking Q scores = 107169
        minQ = 35       maxQ=73         maxRL = 150
        (avgfragSz, std) = 86   138

Generate qual -> quantile map ...


Set up paired read map arrays ...

        # total mapped reads: 356221
        # mapped mate-pairs = 177540

Prepare aln columns file...

 Ref: 3a_D17763 , len = 9629


                create file: test2/3a_D17763.0.499.region
[EXIT]: get_column_x: cigar idx b4 I undefined






I have checked and there are no 'P' in the cigar column. What do you think could be the reason for the error?

One other question. Does Vphaser2 assumes that the PCR duplicates have already been removed from the bam file or just marking them is enough. I had a quick try and there does not seem to be any difference in the calls if I mark PCR duplicates or not.

Many thanks.

AA


Xiao Yang

unread,
Dec 16, 2014, 8:01:05 PM12/16/14
to viral-to...@googlegroups.com
Hi, Vphaser2 assumes duplicates have been removed from alignment (instead of using marked flag). [EXIT]: get_column_x: cigar idx b4 I undefined is mostly likely caused by "N" if it is not "P". 

--
You received this message because you are subscribed to the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
- Xiao

AA

unread,
Dec 18, 2014, 12:49:56 PM12/18/14
to viral-to...@googlegroups.com
Hi Xiao.

Many thanks for your reply. I have looked and there are no "N" or "P" in the cigar string. I tried to run through your code in gdb and it fails when processing the first read in file bam_manip.cpp line 659. I don't understand what is the logic of the program so I cannot say why it fails perhaps you can. Here is the first few lines of the bam file:

MISEQ01:192:000000000-AB09Y:1:1103:4294:19959   163     3a_X76918       1       66      6I17M1D1M2D89M  =       1       110     GGGGGGGGGGTCCTGGAGGCTGCACGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGC       BBBBBBBBBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF       PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:17 SM:i:66 MQ:i:66 PQ:i:604        UQ:i:412        XQ:i:465

MISEQ01:192:000000000-AB09Y:1:1105:24251:6402   99      3a_X76918       1       99      9I121M  =       1       120     GACCCCCCCTCCCACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGAA      BBBBBBBBBBBBGGFGGGGGFGHFHGAFEECEGGGCFHHFHHHFFHFHHHBHHGEGCFHHFHHFHHHHFHH@BGHFHHHHGGCE/FHGGGGGFHHGHHHHGGGGGHFGGGGGHHHGGGGHHHEHHHHHHH      PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:15 SM:i:96 MQ:i:96 PQ:i:580        UQ:i:479        XQ:i:496

MISEQ01:192:000000000-AB09Y:1:1110:28103:19789  163     3a_X76918       1       99      18I128M3I1M     =       1       128     GCAGCCTCCAGGACCCCCCCACACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCAGAT  CCCCCCFFCFFFGGGGGGGGGGGEGFHHFFG3F5FEGDFCEAC0EEGFHFGAG3FFHHFHHHHHGHEGB4F1FCFEGGHHHGHHHGHFHGGGGGGGGFGGGHHHH2FGHGFCGFGFHGEGGHHHGGGGHEHHHHGHHHHH.AACFCAAEF  PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:26 SM:i:67 MQ:i:96 PQ:i:939        UQ:i:834        XQ:i:845

MISEQ01:192:000000000-AB09Y:1:1113:23415:16227  163     3a_X76918       1       99      17I107M =       1       107     GGAGGGGGGGTCCTGGAGGCTACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCG    BCBBCCCCBCDBGGGGGGGGGGHHGHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:20 SM:i:96 MQ:i:96 PQ:i:399        UQ:i:328        XQ:i:389

MISEQ01:192:000000000-AB09Y:1:2103:21858:20680  99      3a_X76918       1       99      7I132M  =       1       131     CCCCCCTCCCGACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCA     CDDCDEEEEEDEGGGGGGGGGGHHGGGGGGGGGGHHHHHHHHHHHHHHHHHHHGHGHHHHHHHHHHHHHHHHHHHHHHGGGGGHHGGGGGHHHHHHHHGGGGGHHHGGGGHHHGGGGHHHHHHHHHHHHHGGGGGGGGH     PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:12 SM:i:96 MQ:i:96 PQ:i:479        UQ:i:431        XQ:i:404

MISEQ01:192:000000000-AB09Y:1:2106:13075:15638  163     3a_X76918       1       67      5I11M1I5M1D110M =       1       127     CCGGCCGGGAGGGGGGGTCCTGGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCC    CCCCCCCCCCCCGGGGGGGGGGHGHGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    PG:Z:MarkDuplicates     RG:Z:WTCHG_157867

       NM:i:16 SM:i:67 MQ:i:67 PQ:i:679        UQ:i:516        XQ:i:574

MISEQ01:192:000000000-AB09Y:1:2113:26579:13854  163     3a_X76918       1       66      2I17M1D1M2D116M =       1       137     GGGGGGTCCTGGAGGCTGCACGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAG        CCCCCCCDCFFFGGGGGGGGGGAGGGHHHCHHHHHHCHHHHHGHHGHGHHFHGFHHHHHHHHHHHHHHHHGGGFGHHGGGGGHGHHHHHHGGGGGHHHGEGGHHHGGGGGGGGGGGGGGGGGGGGFFFFFFFFFDF        PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:13 SM:i:66 MQ:i:66 PQ:i:599        UQ:i:420        XQ:i:453

MISEQ01:192:000000000-AB09Y:1:2107:23686:18728  99      3a_X76918       1       25      6I17M1D1M2D107M1I18M    =       17      160     GGAGGGGGGGGCCTGGAGGCTGTACGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCCCTCCCGGGAGAGCCATAGT  CCCCCDCCDCCCFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF  PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:25 SM:i:6  MQ:i:67 PQ:i:868        UQ:i:751        XQ:i:566

MISEQ01:192:000000000-AB09Y:1:1103:8191:26513   99      3a_X76918       1       99      138M2I10M       =       1       135     CGAGACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGAGAGATCGGCAGAGACC  BBBBBBBFFFFFGGGGGGGGGGGGGGGHHHHHHHHHHHGHHHHHHHGHGHHHHGHHHHHHHHHHHHHHHHHGGGGGHHGGGGGGHHHHHHHGGGGGHHHGGGGHHHGGGGHHHHHHHHHHHHHGGGGGGGGGGGGGGGHHHGG-::;/=.  PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:11 SM:i:67 MQ:i:96 PQ:i:425        UQ:i:351        XQ:i:336

MISEQ01:192:000000000-AB09Y:1:2107:7901:4408    99      3a_X76918       1       99      9I136M2I1M1I1M  =       1       133     CCCCCCCTCCCGAACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTCCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGAGATCGGA  ABAB?DBBBBBBGGGGGGGGGGHHHHGGGGGGGGGGHHHFHHHHGHHGHGHHHHHGHGGHHHHGHHHHHHHHHHHEHHGGGGGGGHHGGGGGHGHHHHHHGDGGFGHHGGGGHHGGGGGHHHHHHHHHHHGHGGGGGGGGGGGGGGGGGG  PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:18 SM:i:67 MQ:i:96 PQ:i:817        UQ:i:649        XQ:i:760

MISEQ01:192:000000000-AB09Y:1:2109:18015:13473  99      3a_X76918       1       99      6I144M  =       42      191     GTGACCGGGTACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAGAGCCATA  CCCDDFDCDBBFGGGGGGGGGGHGGGFGGGGGGHHHHHHHHHHHHHHHHHHHGHGHHHHHHHHHHHHHHHHHHHHHHGGGGGHHGGGGGHHHHHHHHGGGGGHHHGGGGHHHGGGGHHHHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGG  PG:Z:MarkDuplicates

     RG:Z:WTCHG_157867       NM:i:8  SM:i:96 MQ:i:96 PQ:i:159        UQ:i:213        XQ:i:37

MISEQ01:192:000000000-AB09Y:1:2105:20421:23216  163     3a_X76918       1       99      34I116M =       52      201     CCGGTGTACTCACCGGTTCCGCAGACCACTATGGCTCTCCCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTC  BBABABBFFFFFGGGGEEGGGGGFCHHHHHGHHHHHHHCHHHHHHHHGHGHGGGGGGCGGF3EHGHEHHHFHFHHHHHGHGFEFGFHGGFHHHHHHHHGHHHHHHGGDCCHHGGGGDHHGBFCFGCDFGGHFFDFFGHHHCE?EHHHHGG  PG:Z:MarkDuplicates

     RG:Z:WTCHG_157867       NM:i:40 SM:i:96 MQ:i:96 PQ:i:684        UQ:i:1151       XQ:i:74

MISEQ01:192:000000000-AB09Y:1:1103:4294:19959   83      3a_X76918       1       66      7I17M1D1M2D89M  =       1       -110    TGGGGGGGGGGTCCTGGAGGCTGCACGACACTCCACCATGGATCACTCCCCTGTGAGGAACTTCTGTCTTCACGCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGC      CGGGGGGGGHHHHHHHHHHHHHGGGGHHHHHHGHHHHHHHHHHHGGGGGHHHHHHHHHHHHHHHHHHHGGGGGGGHHGGGGGHHHHHHHHGGGGGGGGGGGGDFFDDDDDDDDD      PG:Z:MarkDuplicates     RG:Z:WTCHG_157867       NM:i:18 SM:i:66 MQ:i:66 PQ:i:604        UQ:i:465        XQ:i:412






Many many thanks.


AA

Javier Perez Florido

unread,
Jan 22, 2015, 10:43:24 AM1/22/15
to viral-to...@googlegroups.com
Dear AA and Xiao,

Have you fixed this problem? I've run vphaser-2 through a set of several BAMs and for all of them but one the execution was fine. I've checked such BAM file and cigar string of every read and no one has neither 'N' nor 'P'. The following characters appear in the set of reads: I,S, *, M and D

Any suggestions?

Thanks!
Javier
Reply all
Reply to author
Forward
0 new messages