migrating example to biogo

Sebastien Binet

unread,

Sep 13, 2018, 12:23:24 PM9/13/18

to biogo-user

hi there,

in a couple of days, I am giving an introductory tutorial to Go to (mostly) biologists.

the morning we go through most of what is Go (types, funcs, vars, interfaces, goroutines/channels).

the afternoon, the idea was to go through a bio-related exercize.

as I am not a biologist, I got the organizers to give me a Python-based exercize that I would translate to Go.

The original exercize wasn't using any BioPython module (just the stdlib) so it was easy to translate.

(The original exercize was, given chromosomes in a FASTA file and the associated GFF3 annotation file, extract the nucleotide sequences of the CDS.)

here is my attempt:

- https://gist.github.com/sbinet/514187049e48dfd8fef12d8dbad36650

the data files are available at:

- https://cern.ch/binet/tuto-biogo/extract-gff

(I checked I get the same answer than the original python code, but faster.)

so far so good, but I figured it would be interesting to also show how biogo could be leveraged.

so I came up with:

- https://gist.github.com/sbinet/16435349aa7097f9f78170eb5a2b4363

any ideas on how to improve it?

especially interested in:

- improving the handling of the reverse-complement

- dealing with the GFF3 input file

- producing the final CDS output file.

thanks!

-s

Dan Kortschak

unread,

Sep 13, 2018, 6:16:30 PM9/13/18

to Sebastien Binet, biogo-user

Hi,

You don't need to Clone the sequence during the read loop in the first
program. I take it the input GFF is GFF3 not GTF? (biogo does handle
GTF/GFF2). Yes, just checked. It is. Sorry about that, I don't use them
enough to warrant adding support, though I have started on occasion and
then dropped it again in favour of things that are more useful to me
(PRs welcome).

For output of fasta sequence you can either use a seqio/fasta writer or
for what you're doing just fmt.Fprintf(o, "%60a", s) where s is the
seq.Sequence.

The code you have does similar work to what I have in part of one of
our pipelines. That code is here https://github.com/biogo/examples/blob
/master/igor/seqer/seqer.go

Dan

Sebastien Binet

unread,

Sep 14, 2018, 8:23:58 AM9/14/18

to Dan Kortschak, biogo...@googlegroups.com

On Fri, Sep 14, 2018 at 12:16 AM Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:

Hi,

You don't need to Clone the sequence during the read loop in the first
program. I take it the input GFF is GFF3 not GTF? (biogo does handle
GTF/GFF2). Yes, just checked. It is. Sorry about that, I don't use them
enough to warrant adding support, though I have started on occasion and
then dropped it again in favour of things that are more useful to me
(PRs welcome).

thanks.

this will do. I hope the people I'll inflict this tutorial will stick around and send the PRs :)

For output of fasta sequence you can either use a seqio/fasta writer or
for what you're doing just fmt.Fprintf(o, "%60a", s) where s is the
seq.Sequence.

The code you have does similar work to what I have in part of one of
our pipelines. That code is here https://github.com/biogo/examples/blob
/master/igor/seqer/seqer.go

thanks.

I might point at this code for the "homework assignment" (I have created a very crude GFF3 -> GFF2 converter. the resulting file could be used in this kind of pipeline.)

thanks a lot.

I must say using the biogo packages was quite straightforward, even for somebody with only vague pre-University memories of gene-biology concepts.

-s

Reply all

Reply to author

Forward