Using CSI index

102 views
Skip to first unread message

Vamsi Kodali

unread,
Sep 9, 2021, 3:27:31 PM9/9/21
to sambamba-discussion
I work with genomes with chromosomes that are >512Mbp in size. When generating BAM index file using samtools I see the following error:

$ samtools index SRR12616776-Aligned.sortedByCoord.out.bam [E::hts_idx_check_range] Region 536812869..536918402 cannot be stored in a bai index. Try using a csi index [E::sam_index] Read 'SRR12616776.10897579.1' with ref_name='gi|2020862564|gb|CM030321.1|', ref_length=1002637973, flags=147, pos=536812870 cannot be indexed samtools index: failed to create index for "SRR12616776-Aligned.sortedByCoord.out.bam": Numerical result out of range

sambamba creates index for files like these without any errors. However, when I try to use slice with a region that includes coordinates over the 512Mbp limit, I see the following error:
$ sambamba slice SRR12616776-Aligned.sortedByCoord.out.bam 'gi|2020872190|gb|CM030322.1|:659039846-790847814' -o /dev/null

sambamba 0.8.0
 by Artem Tarasov and Pjotr Prins (C) 2012-2020
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

sambamba-slice: bio.core.bgzf.inputstream.BgzfException@BioD/bio/core/bgzf/inputstream.d(42): Error reading BGZF block starting from offset 2140425286: wrong BGZF magic
----------------
??:? [0x5c250e]
??:? [0x5c9e6a]
??:? [0x5b2e6d]
inputstream.d:63 [0x4b53ff]
inputstream.d:307 [0x4d3bac]
inputstream.d:287 [0x4d3d19]
inputstream.d:390 [0x4d3f00]
inputstream.d:477 [0x4b0307]
randomaccessmanager.d:346 [0x4b6715]
randomaccessmanager.d:312 [0x4ad3ef]
reference.d:80 [0x454064]
slice.d:361 [0x455ffd]
??:? [0x5b2acf]
??:? [0x5b29c5]
??:? [0x5d9894]
??:? [0x400c59]


Is there a way to deal with files like these? I don't see an option to either generate or use a CSI index using sambamba. An old GitHub issue https://github.com/biod/sambamba/issues/284 mentions this but if I understand correctly this was not implemented in sambamba. There are several genomes I work with that can benefit from implementation of CSI index. The performance difference between samtools view and sambamba slice is big enough to prefer sambamba.

Thank you,
Vamsi.
Reply all
Reply to author
Forward
0 new messages