migrating example to biogo

52 views
Skip to first unread message

Sebastien Binet

unread,
Sep 13, 2018, 12:23:24 PM9/13/18
to biogo-user
hi there,

in a couple of days, I am giving an introductory tutorial to Go to (mostly) biologists.

the morning we go through most of what is Go (types, funcs, vars, interfaces, goroutines/channels).
the afternoon, the idea was to go through a bio-related exercize.

as I am not a biologist, I got the organizers to give me a Python-based exercize that I would translate to Go.
The original exercize wasn't using any BioPython module (just the stdlib) so it was easy to translate.
(The original exercize was, given chromosomes in a FASTA file and the associated GFF3 annotation file, extract the nucleotide sequences of the CDS.)

here is my attempt:

the data files are available at:

(I checked I get the same answer than the original python code, but faster.)

so far so good, but I figured it would be interesting to also show how biogo could be leveraged.
so I came up with:

any ideas on how to improve it?
especially interested in:
- improving the handling of the reverse-complement
- dealing with the GFF3 input file
- producing the final CDS output file.

thanks!
-s

Dan Kortschak

unread,
Sep 13, 2018, 6:16:30 PM9/13/18
to Sebastien Binet, biogo-user
Hi,

You don't need to Clone the sequence during the read loop in the first
program. I take it the input GFF is GFF3 not GTF? (biogo does handle
GTF/GFF2). Yes, just checked. It is. Sorry about that, I don't use them
enough to warrant adding support, though I have started on occasion and
then dropped it again in favour of things that are more useful to me
(PRs welcome).

For output of fasta sequence you can either use a seqio/fasta writer or
for what you're doing just fmt.Fprintf(o, "%60a", s) where s is the
seq.Sequence.

The code you have does similar work to what I have in part of one of
our pipelines. That code is here https://github.com/biogo/examples/blob
/master/igor/seqer/seqer.go

Dan

Sebastien Binet

unread,
Sep 14, 2018, 8:23:58 AM9/14/18
to Dan Kortschak, biogo...@googlegroups.com
On Fri, Sep 14, 2018 at 12:16 AM Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:
Hi,

You don't need to Clone the sequence during the read loop in the first
program. I take it the input GFF is GFF3 not GTF? (biogo does handle
GTF/GFF2). Yes, just checked. It is. Sorry about that, I don't use them
enough to warrant adding support, though I have started on occasion and
then dropped it again in favour of things that are more useful to me
(PRs welcome).

thanks.
this will do. I hope the people I'll inflict this tutorial will stick around and send the PRs :)
 

For output of fasta sequence you can either use a seqio/fasta writer or
for what you're doing just fmt.Fprintf(o, "%60a", s) where s is the
seq.Sequence.

The code you have does similar work to what I have in part of one of
our pipelines. That code is here https://github.com/biogo/examples/blob
/master/igor/seqer/seqer.go
thanks.

I might point at this code for the "homework assignment" (I have created a very crude GFF3 -> GFF2 converter. the resulting file could be used in this kind of pipeline.)

thanks a lot.
I must say using the biogo packages was quite straightforward, even for somebody with only vague pre-University memories of gene-biology concepts.

-s
Reply all
Reply to author
Forward
0 new messages