Type confusion - how to retrieve alphabet.Letters from alphabet.Slice

17 views
Skip to first unread message

Robert Syme

unread,
Dec 16, 2016, 1:19:20 AM12/16/16
to biogo-user
Hi all

I'd like to read in a fasta file and perform operations on sections of unambiguous sequence (contigs). I can open a fasta file with r := fasta.NewReader(myreader, mytemplate) and iterate with myseq, err = r.Read(), where myseq is a seq.Sequence.

I pull out the contigs with `myseq.Slice().Slice(contigStart, contigEnd)`, which returns an alphabet.Slice, but I'm not sure how to get from the Slice to the underlying sequence data, as none of the methods (Append, Cap, Copy, Len, Make, or Slice) seem to return the sequence (I'm expecting something like []byte or alphabet.Letters).

How do I get access to the bases from a alphabet.Slice?

For context, I'd like to calculate some basic statistics such as the dinucleotide frequencies from the slice.

Thanks!

Rob Syme
Curtin University
Western Australia

Dan Kortschak

unread,
Dec 16, 2016, 4:59:47 AM12/16/16
to Robert Syme, biogo-user
The alphabet.Slice interface is really an internal type used by the
sequence containers to be able to perform manipulations of the sequence
data. Unless you are writing a new sequence container or a new sequence
data type, you should not need to interact with it.

To address your question of what to do, you have given the fasta.Reader
a template type, so you know what the seq.Sequence concrete type is
that is returned by a Read call. Type assert to that concrete type and
then you can iteration over the sequence. Say you are using a
*linear.Seq

f := fasta.NewReader(r, linear.NewSeq("", nil, alphabet.DNA))
sc := seqio.NewScanner(f)
for sc.Next() {
s := sc.Seq().(*linear.Seq)
for _, l := range s.Seq {
// interesting things

Rob Syme

unread,
Dec 16, 2016, 6:32:12 AM12/16/16
to Dan Kortschak, biogo-user

Perfect. That makes sense. Thanks Dan!

-r

Reply all
Reply to author
Forward
0 new messages