seek to random record in bam file

35 views
Skip to first unread message

Brent Pedersen

unread,
May 8, 2017, 12:58:53 PM5/8/17
to biogo...@googlegroups.com
Hi,

I want to sample some characteristics of a bam file, something like
`samtools flagstat` but with, e.g.
$n reads from each of $m random sites in the bam.

Is there a simple way to do this without using the index?


With the index, I guess I could just use `bam.index.Chunks` with a
random genomic region. Is that the best way? I was hoping I could seek
to a random location, find the next bgzf block start and make the
bam.Reader start from there, but it doesn't seem like that's feasible.

thanks,
-Brent

Dan Kortschak

unread,
May 8, 2017, 8:39:00 PM5/8/17
to Brent Pedersen, biogo...@googlegroups.com
You can't seek to a random location in a bgzf stream and expect to be
able to read valid data. You also can't expect to seek to a random
position in the uncompressed data of a bam file and expect to be able
to parse a valid bam record (this precludes starting at a random file
offset and looking for the next bgzf header).

Either use an index or flick Omit on and off depending on what you
need.

Brent Pedersen

unread,
May 8, 2017, 8:44:21 PM5/8/17
to Dan Kortschak, biogo...@googlegroups.com
I'd done this with a tabix/bgzf file before by seeking to a random
location, finding the next bgzf magic, decompressing, discarding til
the first new-line, and reading from there.
I guess there's no way to do this with bam since there's not an end of record.

I'll use the index.
Reply all
Reply to author
Forward
0 new messages