Implement seqfeature for biogo

31 views
Skip to first unread message

Manolis Maragkakis

unread,
Dec 18, 2014, 4:10:06 PM12/18/14
to biog...@googlegroups.com
I implemented the seqfeature package for biogo. It contains code for gene transcripts, exons, introns etc. Got some ideas from BioPerl and GenOO (a module in Perl I have authored in the past). I have the code for the seqfeature module in my develop branch in github https://github.com/mnsmar/biogo/. Please check it out for suggestions and improvements and let me know if you want to merge it to the main branch.

Dan Kortschak

unread,
Jan 5, 2015, 6:32:59 PM1/5/15
to Manolis Maragkakis, biog...@googlegroups.com
Sorry this has taken so long to get to. Holidays.

It seems to me that the "factory/gff" implementation would be better
done using an API model similar to the approach I use for BAI
construction in biogo.bam. Briefly, first you make a new transcript of
the appropriate type (or open to automatic determination) and then you
Add(feat.Feature) until it tells you it's done. This makes the
transcript construction independent of the feature reader type.

(Please don't add the word factory to Go's lexicon).

I would then flatten the whole thing into one package.

You have a comment that you don't know what the Location() feat.Feature
should return - for a transcript, this would be the chromosome the gene
is on or the gene the exon/intron is in. See biogo/feat/genome for an
example of this. (essentially what you use GenomicRegion for).

As it stands it does not feel idiomatically like biogo code, so I won't
pull it in. If you feel the need for this kind of feature I can add it
to my list (if you would like this, please file an issue).

Dan

Emmanouil Maragkakis

unread,
Jan 6, 2015, 6:14:56 PM1/6/15
to Dan Kortschak, biog...@googlegroups.com
I was thinking about the "factory/gff" too. I think I should make it a separate package and remove it from biogo for now. I like your idea about the Add(feat.Feature). I will think how this could be implemented. It seems it's gonna be kinda complicated as it would have to do type checking on the input feat.Feature to see if it is CDS, 3'UTR, 5'UTR, exon, ... and act accordingly. Does it sound about right?

Just to be sure. So the feat.Feature.Location() is the reference entity on which feat.Feature.Start() and feat.Feature.End() measure from? 

For the following case
  • Chromosome X with coordinates {Start: 0, End: 10000}
  • Transcript A with coordinates relative to X {Start: 100, End: 1000}
  • Region A1 with coordinates relative to A {Start: 10, End: 20}
we have two equally correct implementations for A1
  1. {Start: 10, End: 20, Location: A}
  2. {Start: 110, End: 120, Location: X}
right?

I will work on your suggestions and open an issue when it's done.

PS is there any specific reason why you moved from github to google code? I mean it seems to me that github is way better, especially for managing code contributions. This discussion for example would be comments on a pull request. Maybe you could rethink this. Go itself is moving to Github :)

Dan Kortschak

unread,
Jan 6, 2015, 6:23:50 PM1/6/15
to Emmanouil Maragkakis, biog...@googlegroups.com
On Tue, 2015-01-06 at 18:10 -0500, Emmanouil Maragkakis wrote:
> I was thinking about the "factory/gff" too. I think I should make it a
> separate package and remove it from biogo for now. I like your idea
> about the Add(feat.Feature). I will think how this could be
> implemented. It seems it's gonna be kinda complicated as it would have
> to do type checking on the input feat.Feature to see if it is CDS,
> 3'UTR, 5'UTR, exon, ... and act accordingly. Does it sound about
> right?

Yes. Though I don't see how it's any more complicated than wrapping
around a gff.Reader - gff.Features don't distinguish what their feature
type is via type. You currently grab a feature from the gff.Reader and
process it. In the model I'm suggesting grabing the feature is the
responsibility of the client. You then just need to define the semantic
interface between feat.Features and transcript components.

> Just to be sure. So the feat.Feature.Location() is the reference
> entity on which feat.Feature.Start() and feat.Feature.End() measure
> from?
>
> For the following case
>
> - Chromosome X with coordinates {Start: 0, End: 10000}
> - Transcript A with coordinates relative to X {Start: 100, End:
> 1000}
> - Region A1 with coordinates relative to A {Start: 10, End: 20}
>
> we have two equally correct implementations for A1
>
> 1. {Start: 10, End: 20, Location: A}
> 2. {Start: 110, End: 120, Location: X}
>
> right?

Yes. Though the choice depends on what you are trying to do, I would
link to the level immediately above the feature you are describing (as
necessary). So the first would be appropriate when considering the
region in the context of the transcript and the second would be
appropriate if you don't care about that. You are building a tree
describing structural relationships.
>
> I will work on your suggestions and open an issue when it's done.
>
> PS is there any specific reason why you moved from github to google
> code?

I moved it from my personal name to an individual agnostic location.

> I mean it seems to me that github is way better, especially for
> managing code contributions. This discussion for example would be
> comments on a pull request. Maybe you could rethink this. Go itself is
> moving to Github :)

That may happen - there is a biogo user on github that I can use if I
decide to make that move. Personally, I'm not that keen on the github
review process. biogo-dev list is for this kind of discussion.

The move is not trivial.

Dan Kortschak

unread,
Jan 18, 2015, 5:43:24 PM1/18/15
to Manolis Maragkakis, biog...@googlegroups.com
Thanks for the pull request at [1].

It's not ideal to do reviews over an issue tracker, so we'll discuss
here and if you'll play along, at github.

The first comment is that I think this should be a single package and it
should be placed within biogo/feat.

There are other more detailed comments that I have, but I'd prefer to
make them as part of a github PR, so if you could make a feature branch
that includes the changes I mention above and squashes all the commits
into one, I can comment on that PR more easily.

When we've done all that, I'll ask you to squash comment addressing
commits and I'll pull it in.

thanks
Dan

[1]https://code.google.com/p/biogo/issues/detail?id=19

Emmanouil Maragkakis

unread,
Jan 19, 2015, 12:55:08 AM1/19/15
to Dan Kortschak, biog...@googlegroups.com
The first comment is that I think this should be a single package and it
should be placed within biogo/feat.

Sounds good. One thing though. I was thinking that maybe most of the types (maybe except gene) should also have a seq.Sequence field. Do you think so? In this case do you still think that they would suit under biogo/feat?
 
There are other more detailed comments that I have, but I'd prefer to
make them as part of a github PR, so if you could make a feature branch
that includes the changes I mention above and squashes all the commits
into one, I can comment on that PR more easily.

Will do. Did you mean though that you are going to comment on the "squash" commit? Cause otherwise what should be the upstream branch to start a PR you could comment on? Or did you mean I should start a PR from the new feature branch to my master branch (not master itself - a clone of it)?

Thanks,
Manolis

Dan Kortschak

unread,
Jan 19, 2015, 1:15:54 AM1/19/15
to Emmanouil Maragkakis, biog...@googlegroups.com
On Mon, 2015-01-19 at 00:54 -0500, Emmanouil Maragkakis wrote:
> Sounds good. One thing though. I was thinking that *maybe* most of the
> types (maybe except gene) should also have a seq.Sequence field. Do
> you think so? In this case do you still think that they would suit
> under biogo/feat?
>
seq.Sequence satisfies feat.Feature, I'd prefer it in feat. If you can
come up with a way for the types to reasonably satisfy seq.Sequence in a
way that doesn't require it that's fine though. Probably make the types
interfaces and provide private implementations, though I haven't put any
real thought into that.


> Will do. Did you mean though that you are going to comment on the
> "squash" commit?

Yes please. There's a lot of commits there, and I'd like to be able to
review and comment on it as a single unit.

> Cause otherwise what should be the upstream branch to start a PR
> you could comment on? Or did you mean I should start a PR from the new
> feature branch to my master branch (not master itself - a clone of
> it)?

If you rebase it onto the current biogo master and then squash we'll
work from there.

One other thing I forgot before, I'm afraid the tests will need to be
rewritten to use gocheck for consistency.

Dan

Emmanouil Maragkakis

unread,
Jan 19, 2015, 3:23:09 PM1/19/15
to Dan Kortschak, biog...@googlegroups.com
I squashed everything into a single commit. It's in my gene branch https://github.com/mnsmar/biogo/tree/gene

Manolis


--
You received this message because you are subscribed to the Google Groups "biogo-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogo-dev+...@googlegroups.com.
To post to this group, send an email to biog...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Kortschak

unread,
Jan 19, 2015, 5:42:29 PM1/19/15
to Emmanouil Maragkakis, biog...@googlegroups.com
On Mon, 2015-01-19 at 15:22 -0500, Emmanouil Maragkakis wrote:
> I squashed everything into a single commit. It's in my gene branch
> https://github.com/mnsmar/biogo/tree/gene
>
I've made a code mirror at https://github.com/biogo/biogo. Open a PR
against that.

Dan

Emmanouil Maragkakis

unread,
Jan 19, 2015, 6:18:57 PM1/19/15
to Dan Kortschak, biog...@googlegroups.com
Great. 
I opened a PR.
I forgot to pull your latest edits from master before the squash so there are 2 commits in the pull request. It shouldn't be a problem though for your review. I will eliminate them on the final squash. Sorry for this.

Manolis



Dan

Dan Kortschak

unread,
Jan 19, 2015, 6:22:50 PM1/19/15
to Emmanouil Maragkakis, biog...@googlegroups.com
On Mon, 2015-01-19 at 18:18 -0500, Emmanouil Maragkakis wrote:
> Great.
> I opened a PR.
> I forgot to pull your latest edits from master before the squash so
> there
> are 2 commits in the pull request. It shouldn't be a problem though
> for
> your review. I will eliminate them on the final squash. Sorry for
> this.
>
Thanks. That'll be no problem. I'm just working on a helper to handle
orientation in that will simplify a lot of your code and constrains the
way locations and orientations are used to the way that matches my
design. I'll point this out where it's relevant in the review.

Dan

Reply all
Reply to author
Forward
0 new messages