Hi all,
I have been using the biogo library, and it's really great for what I need, but when I produced some output bam files using hts.bam.Wrtier, I encountered some errors. I may be doing something wrong here, but when I run the output bam file through a checker like this:
$ sambamba_v0.6.6 index -c small-mine-dups.bam
I often get an error like this regarding the indices:
sambamba-index: Bin in read with name 'SN1279:510:C8EL3ANXX:4:1315:17239:8943' is set incorrectly (4680 instead of expected 4900)
And when I then read in the same bam file through biogo's Reader, some of the records do not come through.
I then tested the same bam file using picard, and I also get some error reports like this:
$ picard ValidateSamFile I=small-mine-dups.bam MODE=SUMMARY
[Mon Jul 24 14:02:48 PDT 2017] picard.sam.ValidateSamFile INPUT=small-mine-dups.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Jul 24 14:02:48 PDT 2017] Executing as ayip@ayip on Linux 4.10.0-27-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11; Picard version: 2.10.3-SNAPSHOT
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_INDEXING_BIN 941
ERROR:INVALID_INDEX_FILE_POINTER 1
ERROR:MISSING_PLATFORM_VALUE 155
[Mon Jul 24 14:02:52 PDT 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=1242038272
Does this seem like a bug in biogo, or could I be using the library incorrectly?
thanks, Alex