I work with genomes with chromosomes that are >512Mbp in size. When generating BAM index file using samtools I see the following error:
$ samtools index SRR12616776-Aligned.sortedByCoord.out.bam
[E::hts_idx_check_range] Region 536812869..536918402 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read 'SRR12616776.10897579.1' with ref_name='gi|2020862564|gb|CM030321.1|', ref_length=1002637973, flags=147, pos=536812870 cannot be indexed
samtools index: failed to create index for "SRR12616776-Aligned.sortedByCoord.out.bam": Numerical result out of range
sambamba creates index for files like these without any errors. However, when I try to use slice with a region that includes coordinates over the 512Mbp limit, I see the following error:
$ sambamba slice SRR12616776-Aligned.sortedByCoord.out.bam 'gi|2020872190|gb|CM030322.1|:659039846-790847814' -o /dev/null
sambamba 0.8.0
by Artem Tarasov and Pjotr Prins (C) 2012-2020
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
sambamba-slice: bio.core.bgzf.inputstream.BgzfException@BioD/bio/core/bgzf/inputstream.d(42): Error reading BGZF block starting from offset 2140425286: wrong BGZF magic
----------------
??:? [0x5c250e]
??:? [0x5c9e6a]
??:? [0x5b2e6d]
inputstream.d:63 [0x4b53ff]
inputstream.d:307 [0x4d3bac]
inputstream.d:287 [0x4d3d19]
inputstream.d:390 [0x4d3f00]
inputstream.d:477 [0x4b0307]
randomaccessmanager.d:346 [0x4b6715]
randomaccessmanager.d:312 [0x4ad3ef]
reference.d:80 [0x454064]
slice.d:361 [0x455ffd]
??:? [0x5b2acf]
??:? [0x5b29c5]
??:? [0x5d9894]
??:? [0x400c59]
Is there a way to deal with files like these? I don't see an option to either generate or use a CSI index using
sambamba. An old GitHub issue
https://github.com/biod/sambamba/issues/284 mentions this but if I understand correctly this was not implemented in
sambamba. There are several genomes I work with that can benefit from implementation of CSI index. The performance difference between
samtools view and
sambamba slice is big enough to prefer sambamba.
Thank you,
Vamsi.