Error using splitBam in bamUtils Version: 1.0.10

47 views
Skip to first unread message

Robert Sicko

unread,
Jan 16, 2014, 11:48:33 AM1/16/14
to bamu...@googlegroups.com
Hi,

I'm trying to use splitBam to split a BAM containing multiple subjects(each it's own RG) from an IonTorrent resequencing run. Initially it appears that splitBam is working:

"The following ReadGroup IDs are identified. Splitting into 33 BAM files..
1: N8FU5.IonXpress_011
2: N8FU5.IonXpress_013
        ...
       33: N8FU5.nomatch"

However, the next line is an error:

"parsing BAM - Unknown custom field of type :�
Exiting due to ERROR:
FAIL_PARSE: Unknown tag type."

I then used validate which passed and output the following:

"Number of records read = 4095638
Number of valid records = 4095638

TotalReads(e6) 4.10
MappedReads(e6) 3.86
PairedReads(e6) 0.00
ProperPair(e6) 0.00
DuplicateReads(e6) 0.00
QCFailureReads(e6) 0.00

MappingRate(%) 94.25
PairedReads(%) 0.00
ProperPair(%) 0.00
DupRate(%) 0.00
QCFailRate(%) 0.00

TotalBases(e6) 472.75
BasesInMappedReads(e6) 461.02
Returning: 0 (SUCCESS)"

Also of note, the log file ends after "33: N8FU5.nomatch" so it does not include the error. Let me know if you need more information to figure out what is going on.
Thanks,
Bob

Mary Kate Wing

unread,
Jan 16, 2014, 1:39:01 PM1/16/14
to bamutils
Unfortunately, bam validate does not currently check the tags.  Since it doesn't try to parse the tags, it doesn't encounter the problem.

That error means the code found a tag type that is not A, c, C, s, S, i, I, Z, B, f.
It should print the name of the tag and the type, but it looks like it is printing garbage, so maybe the parsing is messed up and what the code thinks is a tag isn't.

Note: the 'H' type is not currently supported.

Another potential issue could be an incorrect parsing of one of the supported types.  

If it does not take too long before you encounter the error, I would suggest narrowing it down to find out which read is causing the error.

You can do this by doing something like:
bin/bam yourBam.bam   -   > /dev/null

You should see the same error message, but at the end it should say something like:
Number of records read = 12
Number of records written = 11

The number of records read will tell you the record number that is failing since bamUtil stops reading records after it encounters the error.

Since BamUtil is having trouble parsing this record, I would see if samtools can do better.
You can try pulling out the failed record, in this case record 12:
    samtools view yourBam.bam | head -n 12 | tail -n 1

This will help us see the tag types.
Unfortunately, the error might be in the BAM representation of the tag so may not be obvious when we look at the tags.


Are you familiar with a debugger?  That is how I would typically debug this situation.
If not, I can probably give you some updated code that will print out additional information or the entire BAM record when it hits this error.


--
You received this message because you are subscribed to the Google Groups "bamUtils" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bamutils+u...@googlegroups.com.
To post to this group, send email to bamu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bamutils/9990de02-d880-4265-9c11-227f338e7b37%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Robert Sicko

unread,
Jan 16, 2014, 4:38:58 PM1/16/14
to bamu...@googlegroups.com
hm... maybe TorrentServer uses H type tags in the bam? I tried grep ':H:' my.sam and it looks like the only spot :H: occurs is in the QUAL field, but I could have missed one scrolling past in the tags. I ended up getting it split with "bamtools split -tag RG"

I can compile and run additional code to print out the BAM file when it crashes, but setting up a debugger and getting it to work would probably take me a while.

Thanks,
Bob

Mary Kate Wing

unread,
Jan 16, 2014, 4:49:38 PM1/16/14
to bamutils
Attached is an updated SamRecord.cpp file that you can put in libStatGen/bam/SamRecord.cpp

If you are able to, download the file and recompile libStatGen & bamUtil.

With this code change, just rerun your split.  It will print out more details on your failed record.

For example, I manually replaced the 'Z' type of RG with 'Q':
parsing BAM - Unknown custom field of type RG:Q
BAM Tags: 
AM(414d):S(53):1000(e803)
MD(4d44):Z(5a):30A0C5(33304130)
NM(4e4d):I(49):2000000(80841e00)
Failed Tag: RG(5247):Q(51)
5247320058544152

414d53e8034d445a333041304335004e4d4980841e005247515247320058544152
Exiting due to ERROR:
FAIL_PARSE: Unknown tag type.

It will print each successfully parsed tag with its tag name:type:value with each one followed by the hex values.
  TAG(hex):TYPE(hex):VALUE(hex)
It then has the failed tag/type.
The line after the failed tag, is the hex representation of the rest of the tag buffer. 

At the end is the hex representation of the entire tag buffer.

As I said, my guess is that either you have a tag type that I am not expecting or something is going wrong in parsing a previous tag.

-------------------
After I had the above (and attached code) written, I got your email that you had found another solution.  

I understand about the debugger issue, and figured it would be easier for me to write the attached debug code to get the info I'd need.  Sorry I didn't have it ready sooner.

If you are willing and have time to try this new version, I'd appreciate knowing information about the Tag prior to your failed tag as well as the failed tag and some of the following hex values.  
This would allow me to identify the problem with my code and fix it so others would not encounter the same issue.

If you do not have time, I understand, and I am glad you found something that worked.

Either way, let me know if you have any other questions on any of our tools,

Mary Kate


SamRecord.cpp

Robert Sicko

unread,
Jan 16, 2014, 5:36:50 PM1/16/14
to bamu...@googlegroups.com
I ran splitBam again after recompiling. Results are below:

"parsing BAM - Unknown custom field of type :�
BAM Tags: 
ZP(5a50):B(42):f (66030000)
Failed Tag: (0000):�(80)
3cb03b0053aa3b6a7f213a5a4d42735d000000000100000e0100000000f00002000e019a00d60010000000c6000000fe001600d4000001ce01e200f20002012601fc00020108000600dc002a000200b80200009001fa0000008802bc01f00000000603da010800000000000c01120300000400f40100000a00f60006000800fe021600f6010c001000e400120002000a030c00ea01fa00fcffee02cc01ee000e00a002b601fafffcfffeff000182023e000a00b0010a000e00fe000a004000880234008c011c00feffcc0062005a46691b00000052475a4e384655352e496f6e5870726573735f3036310050475a746d6170004d445a3530004e4d69000000004153693200000058415a6d6170342d310058536932000000

5a50426603000000803cb03b0053aa3b6a7f213a5a4d42735d000000000100000e0100000000f00002000e019a00d60010000000c6000000fe001600d4000001ce01e200f20002012601fc00020108000600dc002a000200b80200009001fa0000008802bc01f00000000603da010800000000000c01120300000400f40100000a00f60006000800fe021600f6010c001000e400120002000a030c00ea01fa00fcffee02cc01ee000e00a002b601fafffcfffeff000182023e000a00b0010a000e00fe000a004000880234008c011c00feffcc0062005a46691b00000052475a4e384655352e496f6e5870726573735f3036310050475a746d6170004d445a3530004e4d69000000004153693200000058415a6d6170342d310058536932000000
Exiting due to ERROR:
FAIL_PARSE: Unknown tag type."


-Bob 

Mary Kate Wing

unread,
Jan 17, 2014, 9:55:09 AM1/17/14
to bamutils
It looks like I am not parsing the 'B' type properly in a BAM file.
I will work to fix that.  

Thank you so much for this information.  It will be a huge help in ensuring I handle it properly.

Mary Kate


Robert Sicko

unread,
Jan 17, 2014, 10:03:10 AM1/17/14
to bamu...@googlegroups.com
No problem, thanks for looking into it. If you need me to test anything further let me know.

-Bob

mkt...@umich.edu

unread,
Feb 28, 2014, 4:27:32 PM2/28/14
to bamu...@googlegroups.com
Sorry for the delay in responding - I fixed it, but then didn't have a chance to do the official release until today (although I should have pointed you to the updated github version - sorry about that).

BamUtils should now properly handle 'B' tags.
If you have time and would like to further use/test bamUtil, please give the updated version a try.

You will need to upgrade both libStatGen & bamUtil to the latest versions - you can either use the release or the current github version - they are the same.

Mary Kate
Reply all
Reply to author
Forward
0 new messages