Return value of sam.Record.End()

5 views
Skip to first unread message

ay...@grailbio.com

unread,
Sep 19, 2017, 12:29:47 PM9/19/17
to biogo-user
Hi all,
I am looking for clarification about the return value of sam.Record.End():

// End returns the highest query-consuming coordinate end of the alignment.                                                                                                                                                                                        
// The position returned by End is not valid if r.Cigar.IsValid(r.Seq.Length)                                                                                                                                                                                      
// is false.                                                                                                                                                                                                                                                      
func
(r *Record) End() int {
    pos
:= r.Pos
   
end := pos
   
for _, co := range r.Cigar {
        pos
+= co.Len() * co.Type().Consumes().Reference
       
end = max(end, pos)
   
}
   
return end
}


The comment says that End() returns the highest query-consuming coordinate of the alignment.
I'm guessing that means the End() return value points to an actual base in the read (as opposed
to the position right after the last base).  I assume that Start() is also supposed to point to an
actual base.

The question is, the code adds the number of consumed characters from the cigar, so if I have a
read with Start() == 10, and the cigar is 1M, then End() will return 11.

But that seems wrong because 11 doesn't actually point go part of the read.  Is this the intended 
behavior of End()?  Or is End() supposed to return 10 in this case?

Thanks, Alex



Dan Kortschak

unread,
Sep 19, 2017, 7:00:53 PM9/19/17
to ay...@grailbio.com, biogo-user
All of the biogo code uses zero-based half-open indexing. Unfortunately
many file formats (including SAM) use 1-based indexing.

Zero-based half-open has the extremely satisfying property that end-
start = length. And it matches the semantics of nearly every
programming languages slicing operations.

Maybe the docs could be improved. Please send a PR if you have better
wording.

Dan
Reply all
Reply to author
Forward
0 new messages