I have just been burned by a problem with the Bio::Seq add_SeqFeature()
method. Bio::Seq is a class which implements Bio::SeqI, which itselt
implements Bio::FeatureHolderI, which defines an add_SeqFeature() method as:
> Usage : $feat->add_SeqFeature($subfeat);
> $feat->add_SeqFeature($subfeat,'EXPAND')
> Function: adds a SeqFeature into the subSeqFeature array.
> with no 'EXPAND' qualifer, subfeat will be tested
> as to whether it lies inside the parent, and throw
> an exception if not.
>
> If EXPAND is used, the parent''s start/end/strand will
> be adjusted so that it grows to accommodate the new
> subFeature
> Example :
> Returns : nothing
> Args : a Bio::SeqFeatureI object
In comparison, the add_SeqFeature method implemented by Bio::Seq is:
> Title : add_SeqFeature
> Usage : $seq->add_SeqFeature($feat);
> $seq->add_SeqFeature(@feat);
> Function: Adds the given feature object (or each of an array of feature
> objects to the feature array of this
> sequence. The object passed is required to implement the
> Bio::SeqFeatureI interface.
> Returns : 1 on success
> Args : A Bio::SeqFeatureI implementing object, or an array of such
> objects.
As you can see, there is a discrepancy. While the Bio::Seq method takes
an array of features, Bio::FeatureHolderI states that it should take a
single feature and the optional 'EXPAND' scalar.
It would not be very hard to modify Bio::Seq so that it complies with
Bio::FeatureHolderI. One would have to make sure that the Bio::Seq
feature takes the 'EXPAND' option and to have a deprecation message for
any call with more than one feature.
First, I am missing something here or does Bio::Seq need to comply with
Bio::FeatureHolderI?
Then, if Bio::Seq needs to be changed, I wanted to have some feedback
from other wise Bioperl-ers to see if the course of action I described
is adapted.
Best,
Florent
_______________________________________________
Bioperl-l mailing list
Biop...@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
It seems from the documentation you quote that Bio::Seq does support a single feature to be added, so that part is fine. However, there is no mention of the EXPAND option, so if that's indeed not supported (can't look at the code right now) then it is not compliant, and that should be fixed.
-hilmar
Sent with a tap.
Regardless, it could be fixed so that all implementations accept a list of features, which is what I think Florent means. Dealing with the additional optional 'EXPAND' Is kind of a pain, which includes a need to check type of the last arg and DTRT, but it is possible.
Chris
> Dealing with the additional optional 'EXPAND' Is kind of a pain
I know, but if we don't want it required we should take it out of the interface.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================
Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.:
> Usage : $feat->add_SeqFeature($subfeat);
> $feat->add_SeqFeature($subfeat,'EXPAND');
Note that it does not simply accept 'EXPAND', but it actually, acts on it.
> On Mar 4, 2012, at 3:30 PM, Fields, Christopher J wrote:
>
>> Dealing with the additional optional 'EXPAND' Is kind of a pain
>
> I know, but if we don't want it required we should take it out of the interface.
>
> -hilmar
Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
So I think the FeatureHolderI interface is really too specific in this case, and the RangeI-specific requirements (e.g. for 'EXPAND') should be relaxed or completely removed as Hilmar indicates. Implementations can freely go above and beyond what the interface requires. I think we do need some alternative method in FeatureHolderI for safely adding multiple features in one go, this could simply use the plural, e.g. add_SeqFeatures().
Re: Bio::Seq::add_SeqFeature as currently implemented, this behavior has been around for a long, long time. 'git blame' has Ewan adding this back in 2000, so I expect this to potentially break something even if it doesn't show up in tests. Doesn't mean we shouldn't fix it, just that we need to indicate the problem with it (maybe as easy as pointing to this thread), and suggest a possible alternative such as the way mentioned above.
chris
> On 05/03/12 06:30, Fields, Christopher J wrote:
>> What do other FeatureHolderI do?
>
> Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.:
>> Usage : $feat->add_SeqFeature($subfeat);
>> $feat->add_SeqFeature($subfeat,'EXPAND');
>
> Note that it does not simply accept 'EXPAND', but it actually, acts on it.
>
> Florent
Yes, that matches up. My guess is the interface was developed originally with this in mind, but was (ab)used for Bio::Seq w/o changing the method.
chris
Ok, I figured this was something like that. I think it is fine to leave
this behavior in, as long as it is documented, which I have just done:
https://github.com/bioperl/bioperl-live/commit/5f115e23c09c0c72ec3af2436193c0a6d60aeb54
Florent
That should take care of the issue at hand. See this commit:
https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8
Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store,
Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite,
Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an
add_SeqFeature() method that accepts an array of features but they are
not Bio::FeatureHolderI, so that's ok. Maybe they should be
Bio::FeatureHolderI but that's another story.
However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and
is not compliant. I fixed it here:
https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380
Florent
Dave
On Wed, Mar 7, 2012 at 06:00, Florent Angly <floren...@gmail.com> wrote:
> On 05/03/12 12:13, Fields, Christopher J wrote:
>
>> Actually, I misread Florent's original post, I was thinking that
>> FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think
>> Bio::Seq is abusing the FeatureHolderI interface, it should just be for a
>> single feature (it can safely ignore the 'EXPAND' option). However, use of
>> 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a
>> start and end to expand), something that is not mentioned in the interface
>> as a requirement and is not guaranteed, for instance Bio::Seq is not
>> Bio::RangeI.
>>
> Ok, I have:
> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that
> 'EXPAND' will be honored
> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND'
> keyword (but do nothing about it)
> 3/ deprecated the use of passing multiple features to add_SeqFeature() in
> Bio::Seq
> 4/ updated documentation and code that relied on passing multiple features
>
> That should take care of the issue at hand. See this commit:
> https://github.com/bioperl/**bioperl-live/commit/**
> a5bebe00c505fbf5279f5d717790ed**36eefcc2b8<https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8>
>
> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store,
> Bio::DB::SeqFeature::**NormalizedFeature, Bio::SeqFeature::Lite,
> Bio::DB::SeqFeature, Bio::Search::Tiling::**MapTileUtils) that have an
> add_SeqFeature() method that accepts an array of features but they are not
> Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI
> but that's another story.
>
> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and
> is not compliant. I fixed it here: https://github.com/bioperl/**
> bioperl-live/commit/**29e0449d05f37c9c748aaaff8cfe59**6ca7c3d380<https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380>
>
>
> Florent
> ______________________________**_________________
> Bioperl-l mailing list
> Biop...@lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
Might be interesting to see what breaks (if anything…)
chris
On Mar 6, 2012, at 4:17 AM, Dave Messina wrote:
> Very nice! Thanks, Florent!
>
> Dave
>
>
>
> On Wed, Mar 7, 2012 at 06:00, Florent Angly <floren...@gmail.com> wrote:
> On 05/03/12 12:13, Fields, Christopher J wrote:
> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
> Ok, I have:
> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored
> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it)
> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq
> 4/ updated documentation and code that relied on passing multiple features
>
> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8
>
> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story.
>
> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380
>
>
> Florent
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================
_______________________________________________
chris
That might work - but you may remove virus reads mapping onto integrated
prophage inside the bacterial etc references you use. Be careful.
Peter
If I understand you correctly, you want to assemble viral genomes from
metagenomic reads. While assembly of metagenomic data can be
straightforward in some situations (e.g., low complexity communities), it
is generally difficult and can often result in chimeras. By mapping
sequences to reference genomes (i.e., fragment recruitment), you can
effectively reduce the complexity of the community in silico and
subsequently reduce the possibility of chimeric errors.
That said, reference genomes frequently represent a relatively small subset
of the total diversity in a community, so you might have to adopt liberal
mapping parameters if you want to minimized the amount of bacterial,
archaeal and eukaryotic DNA in your metagenome. This could, of course,
result in the spurious filtering of viral reads that happen to share some
similiarity with a reference genome. Personally, I would prefer to lose
some of the viral reads and produce incomplete assemblies if I was
confident that it would decrease the chance of chimeric assemblies. Then
again, I personally try to avoid assembly from metagenomic data when
possible, so I may be lending biased advice.
The DeRisi lab has done some great work on the subject of viral genome
assembly from metagenomic data. I recommend you take a look at PRICE, which
you can download from the link below:
I'm not sure I specifically answered your question. I do hope this helps
and would be happy to talk more, if you like. But I'm certainly no
metagenomics assembly expert. And we might move this conversation off the
list given that it isn't quite on topic.
Good luck,
Tom
I should have said in my reply the forum on SEQanswers.com would
be a good place to ask for advice.
I also realize that deprecation should be used with caution. Feel free
to revert and let add_SeqFeature() accept an array of features if you
think that the downsides (deprecating arrays) outweights the advantages.
Florent
On 07/03/12 14:51, Hilmar Lapp wrote:
> Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise.
>
> -hilmar
>
> On Mar 7, 2012, at 12:00 AM, Florent Angly wrote:
>
>> On 05/03/12 12:13, Fields, Christopher J wrote:
>>> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI.
>> Ok, I have:
>> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored
>> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it)
>> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq
>> 4/ updated documentation and code that relied on passing multiple features
>>
>> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8
>>
>> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story.
>>
>> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380
>>
>> Florent
_______________________________________________