[Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method

6 views
Skip to first unread message

Florent Angly

unread,
Mar 5, 2012, 2:39:08 AM3/5/12
to Bioperl
Hi all,

I have just been burned by a problem with the Bio::Seq add_SeqFeature()
method. Bio::Seq is a class which implements Bio::SeqI, which itselt
implements Bio::FeatureHolderI, which defines an add_SeqFeature() method as:
> Usage : $feat->add_SeqFeature($subfeat);
> $feat->add_SeqFeature($subfeat,'EXPAND')
> Function: adds a SeqFeature into the subSeqFeature array.
> with no 'EXPAND' qualifer, subfeat will be tested
> as to whether it lies inside the parent, and throw
> an exception if not.
>
> If EXPAND is used, the parent''s start/end/strand will
> be adjusted so that it grows to accommodate the new
> subFeature
> Example :
> Returns : nothing
> Args : a Bio::SeqFeatureI object

In comparison, the add_SeqFeature method implemented by Bio::Seq is:
> Title : add_SeqFeature
> Usage : $seq->add_SeqFeature($feat);
> $seq->add_SeqFeature(@feat);
> Function: Adds the given feature object (or each of an array of feature
> objects to the feature array of this
> sequence. The object passed is required to implement the
> Bio::SeqFeatureI interface.
> Returns : 1 on success
> Args : A Bio::SeqFeatureI implementing object, or an array of such
> objects.

As you can see, there is a discrepancy. While the Bio::Seq method takes
an array of features, Bio::FeatureHolderI states that it should take a
single feature and the optional 'EXPAND' scalar.

It would not be very hard to modify Bio::Seq so that it complies with
Bio::FeatureHolderI. One would have to make sure that the Bio::Seq
feature takes the 'EXPAND' option and to have a deprecation message for
any call with more than one feature.

First, I am missing something here or does Bio::Seq need to comply with
Bio::FeatureHolderI?
Then, if Bio::Seq needs to be changed, I wanted to have some feedback
from other wise Bioperl-ers to see if the course of action I described
is adapted.

Best,

Florent
_______________________________________________
Bioperl-l mailing list
Biop...@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Hilmar Lapp

unread,
Mar 4, 2012, 2:27:34 PM3/4/12
to Florent Angly, Bioperl
It does need to comply - the interface is the contract. That being said, any implementation can go *beyond* the contract in what it supports - it just can't fall short of it. So an implementation can always implement more than the contract requires, but never less or it is out of compliance.

It seems from the documentation you quote that Bio::Seq does support a single feature to be added, so that part is fine. However, there is no mention of the EXPAND option, so if that's indeed not supported (can't look at the code right now) then it is not compliant, and that should be fixed.

-hilmar

Sent with a tap.

Fields, Christopher J

unread,
Mar 4, 2012, 3:30:40 PM3/4/12
to Hilmar Lapp, Bioperl, Florent Angly
What do other FeatureHolderI do? What does the current set of tests expect? That would give a consensus on expected behavior, then the decision could be made. For instance, with SeqFeature and Seq the behavior is for a single feature as the arg with a n optional expansion arg in some cases. Based on that, I might argue that the interface itself wasn't updated when the implementations were set up. The name of the method seems to imply a single feature is the arg (otherwise why not call it add_SeqFeatures).

Regardless, it could be fixed so that all implementations accept a list of features, which is what I think Florent means. Dealing with the additional optional 'EXPAND' Is kind of a pain, which includes a need to check type of the last arg and DTRT, but it is possible.

Chris

Hilmar Lapp

unread,
Mar 4, 2012, 4:44:41 PM3/4/12
to Fields, Christopher J, Bioperl, Florent Angly

On Mar 4, 2012, at 3:30 PM, Fields, Christopher J wrote:

> Dealing with the additional optional 'EXPAND' Is kind of a pain

I know, but if we don't want it required we should take it out of the interface.

-hilmar

--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================

Florent Angly

unread,
Mar 5, 2012, 6:04:54 PM3/5/12
to Fields, Christopher J, Bioperl
On 05/03/12 06:30, Fields, Christopher J wrote:
> What do other FeatureHolderI do?

Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.:
> Usage : $feat->add_SeqFeature($subfeat);
> $feat->add_SeqFeature($subfeat,'EXPAND');

Note that it does not simply accept 'EXPAND', but it actually, acts on it.

Fields, Christopher J

unread,
Mar 4, 2012, 9:13:38 PM3/4/12
to Hilmar Lapp, Bioperl, Florent Angly
On Mar 4, 2012, at 3:44 PM, Hilmar Lapp wrote:

> On Mar 4, 2012, at 3:30 PM, Fields, Christopher J wrote:
>
>> Dealing with the additional optional 'EXPAND' Is kind of a pain
>
> I know, but if we don't want it required we should take it out of the interface.
>
> -hilmar

Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.

So I think the FeatureHolderI interface is really too specific in this case, and the RangeI-specific requirements (e.g. for 'EXPAND') should be relaxed or completely removed as Hilmar indicates. Implementations can freely go above and beyond what the interface requires. I think we do need some alternative method in FeatureHolderI for safely adding multiple features in one go, this could simply use the plural, e.g. add_SeqFeatures().

Re: Bio::Seq::add_SeqFeature as currently implemented, this behavior has been around for a long, long time. 'git blame' has Ewan adding this back in 2000, so I expect this to potentially break something even if it doesn't show up in tests. Doesn't mean we shouldn't fix it, just that we need to indicate the problem with it (maybe as easy as pointing to this thread), and suggest a possible alternative such as the way mentioned above.

chris

Fields, Christopher J

unread,
Mar 4, 2012, 9:15:59 PM3/4/12
to Florent Angly, Bioperl
On Mar 5, 2012, at 5:04 PM, Florent Angly wrote:

> On 05/03/12 06:30, Fields, Christopher J wrote:
>> What do other FeatureHolderI do?
>
> Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.:
>> Usage : $feat->add_SeqFeature($subfeat);
>> $feat->add_SeqFeature($subfeat,'EXPAND');
>
> Note that it does not simply accept 'EXPAND', but it actually, acts on it.
>
> Florent

Yes, that matches up. My guess is the interface was developed originally with this in mind, but was (ab)used for Bio::Seq w/o changing the method.

chris

Florent Angly

unread,
Mar 6, 2012, 8:31:18 PM3/6/12
to Fields, Christopher J, Bioperl
On 05/03/12 12:13, Fields, Christopher J wrote:
> Re: Bio::Seq::add_SeqFeature as currently implemented, this behavior
> has been around for a long, long time.

Ok, I figured this was something like that. I think it is fine to leave
this behavior in, as long as it is documented, which I have just done:
https://github.com/bioperl/bioperl-live/commit/5f115e23c09c0c72ec3af2436193c0a6d60aeb54

Florent

Florent Angly

unread,
Mar 7, 2012, 12:00:50 AM3/7/12
to Fields, Christopher J, Bioperl
On 05/03/12 12:13, Fields, Christopher J wrote:
> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
Ok, I have:
1/ clarified in Bio::FeatureHolderI that there is no guarantee that
'EXPAND' will be honored
2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND'
keyword (but do nothing about it)
3/ deprecated the use of passing multiple features to add_SeqFeature()
in Bio::Seq
4/ updated documentation and code that relied on passing multiple features

That should take care of the issue at hand. See this commit:
https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8

Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store,
Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite,
Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an
add_SeqFeature() method that accepts an array of features but they are
not Bio::FeatureHolderI, so that's ok. Maybe they should be
Bio::FeatureHolderI but that's another story.

However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and
is not compliant. I fixed it here:
https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380

Florent

Dave Messina

unread,
Mar 6, 2012, 5:17:22 AM3/6/12
to Florent Angly, Fields, Christopher J, Bioperl
Very nice! Thanks, Florent!

Dave

On Wed, Mar 7, 2012 at 06:00, Florent Angly <floren...@gmail.com> wrote:

> On 05/03/12 12:13, Fields, Christopher J wrote:
>
>> Actually, I misread Florent's original post, I was thinking that
>> FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think
>> Bio::Seq is abusing the FeatureHolderI interface, it should just be for a
>> single feature (it can safely ignore the 'EXPAND' option). However, use of
>> 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a
>> start and end to expand), something that is not mentioned in the interface
>> as a requirement and is not guaranteed, for instance Bio::Seq is not
>> Bio::RangeI.
>>
> Ok, I have:
> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that
> 'EXPAND' will be honored
> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND'
> keyword (but do nothing about it)
> 3/ deprecated the use of passing multiple features to add_SeqFeature() in
> Bio::Seq
> 4/ updated documentation and code that relied on passing multiple features
>
> That should take care of the issue at hand. See this commit:

> https://github.com/bioperl/**bioperl-live/commit/**
> a5bebe00c505fbf5279f5d717790ed**36eefcc2b8<https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8>


>
> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store,

> Bio::DB::SeqFeature::**NormalizedFeature, Bio::SeqFeature::Lite,
> Bio::DB::SeqFeature, Bio::Search::Tiling::**MapTileUtils) that have an


> add_SeqFeature() method that accepts an array of features but they are not
> Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI
> but that's another story.
>
> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and

> is not compliant. I fixed it here: https://github.com/bioperl/**
> bioperl-live/commit/**29e0449d05f37c9c748aaaff8cfe59**6ca7c3d380<https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380>
>
>
> Florent
> ______________________________**_________________
> Bioperl-l mailing list
> Biop...@lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>

Fields, Christopher J

unread,
Mar 6, 2012, 8:45:29 AM3/6/12
to Dave Messina, Bioperl, Florent Angly
Yeah, I think that should work. We should probably clarify the reasoning a bit in FeatureHolderI if you haven't already updated it.

Might be interesting to see what breaks (if anything…)

chris

On Mar 6, 2012, at 4:17 AM, Dave Messina wrote:

> Very nice! Thanks, Florent!
>
> Dave
>
>
>
> On Wed, Mar 7, 2012 at 06:00, Florent Angly <floren...@gmail.com> wrote:
> On 05/03/12 12:13, Fields, Christopher J wrote:
> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
> Ok, I have:
> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored
> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it)
> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq
> 4/ updated documentation and code that relied on passing multiple features
>

> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8
>
> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story.
>
> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380
>
>
> Florent

Hilmar Lapp

unread,
Mar 6, 2012, 11:51:49 PM3/6/12
to Florent Angly, Fields, Christopher J, Bioperl
Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise.

-hilmar

--

===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================

_______________________________________________

Fields, Christopher J

unread,
Mar 7, 2012, 12:12:00 AM3/7/12
to Hilmar Lapp, Bioperl, Florent Angly
Beyond the grammatical incorrectness of passing multiple features to a method named add_SeqFeature(), I'm fairly neutral on it, actually, as long as the behavior is (1) consistent and (2) documented (seems the interface itself would need some clarification with that in mind).

chris

Juan Jovel

unread,
Mar 9, 2012, 12:24:37 PM3/9/12
to floren...@gmail.com, ChrisField, bioperl

Hello All!
No clear to me what generally speaking is the advantage of filtering out reads when we are interested in de novo assembly of specific taxonomic groups (i.e. bacteria, viruses, fungi, etc). More specifically, my questions are:
1. In a metagenomics library, if I am interested in de novo assembly of virus genomes, should I remove bacterial and human reads (that's what I do), and leave in phages and known viral sequences.
2. Or should I remove everything what is known and leave only "unknown" reads for assembly??
What are the advantages and pitfalls of each scenario? Thinking in terms of de Bruijn graphs...
Any comment would be highly appreciated and my apologies again.
Cheers,
JUAN

Peter Cock

unread,
Mar 9, 2012, 12:33:50 PM3/9/12
to Juan Jovel, ChrisField, bioperl, floren...@gmail.com
On Fri, Mar 9, 2012 at 5:24 PM, Juan Jovel <jovel...@hotmail.com> wrote:
>
> Hello All!
> No clear to me what generally speaking is the advantage of filtering out reads
> when we are interested in de novo assembly of specific taxonomic groups
> (i.e. bacteria, viruses, fungi, etc). More specifically, my questions are:
> 1. In a metagenomics library, if I am interested in de novo assembly of virus
> genomes, should I remove bacterial and human reads (that's what I do), and
> leave in phages and known viral sequences.

That might work - but you may remove virus reads mapping onto integrated
prophage inside the bacterial etc references you use. Be careful.

Peter

Thomas Sharpton

unread,
Mar 9, 2012, 4:49:04 PM3/9/12
to Peter Cock, Juan Jovel, ChrisField, floren...@gmail.com, bioperl
Hi Juan,

If I understand you correctly, you want to assemble viral genomes from
metagenomic reads. While assembly of metagenomic data can be
straightforward in some situations (e.g., low complexity communities), it
is generally difficult and can often result in chimeras. By mapping
sequences to reference genomes (i.e., fragment recruitment), you can
effectively reduce the complexity of the community in silico and
subsequently reduce the possibility of chimeric errors.

That said, reference genomes frequently represent a relatively small subset
of the total diversity in a community, so you might have to adopt liberal
mapping parameters if you want to minimized the amount of bacterial,
archaeal and eukaryotic DNA in your metagenome. This could, of course,
result in the spurious filtering of viral reads that happen to share some
similiarity with a reference genome. Personally, I would prefer to lose
some of the viral reads and produce incomplete assemblies if I was
confident that it would decrease the chance of chimeric assemblies. Then
again, I personally try to avoid assembly from metagenomic data when
possible, so I may be lending biased advice.

The DeRisi lab has done some great work on the subject of viral genome
assembly from metagenomic data. I recommend you take a look at PRICE, which
you can download from the link below:

http://derisilab.ucsf.edu/

I'm not sure I specifically answered your question. I do hope this helps
and would be happy to talk more, if you like. But I'm certainly no
metagenomics assembly expert. And we might move this conversation off the
list given that it isn't quite on topic.

Good luck,
Tom

Peter Cock

unread,
Mar 9, 2012, 6:00:17 PM3/9/12
to Thomas Sharpton, Juan Jovel, ChrisField, floren...@gmail.com, bioperl
On Fri, Mar 9, 2012 at 9:49 PM, Thomas Sharpton
<thomas....@gmail.com> wrote:
> Hi Juan,
>
> ... And we might move this conversation off the list given that

> it isn't quite on topic.
>
> Good luck,
> Tom

I should have said in my reply the forum on SEQanswers.com would
be a good place to ask for advice.

Florent Angly

unread,
Mar 10, 2012, 9:47:11 PM3/10/12
to Hilmar Lapp, Fields, Christopher J, Bioperl
I realize that Hilmar. However, to me, it makes it clearer to add a
single feature and it leaves room to implement a add_SeqFeatures()
method in FeatureHolderI to add multiple features at once, based on the
implementation-specific add_SeqFeature() method.

I also realize that deprecation should be used with caution. Feel free
to revert and let add_SeqFeature() accept an array of features if you
think that the downsides (deprecating arrays) outweights the advantages.

Florent


On 07/03/12 14:51, Hilmar Lapp wrote:
> Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise.
>
> -hilmar
>
> On Mar 7, 2012, at 12:00 AM, Florent Angly wrote:
>
>> On 05/03/12 12:13, Fields, Christopher J wrote:
>>> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
>> Ok, I have:
>> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored
>> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it)
>> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq
>> 4/ updated documentation and code that relied on passing multiple features
>>
>> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8
>>
>> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story.
>>
>> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380
>>
>> Florent

_______________________________________________

Reply all
Reply to author
Forward
0 new messages