[Bioperl-l] using Bio::SeqIO to convert from table to genbank format ..... attribute_map example

8 views
Skip to first unread message

Cook, Malcolm

unread,
Sep 12, 2015, 1:26:54 AM9/12/15
to biop...@mailman.open-bio.org, hl...@gmx.net
Fellow long-time BioPerlers,

I am using Bio::SeqIO with success to convert between table (c.f. http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/table.pm) and genbank flatfile format.

I have Bio::SeqIO sequence format conversion wrapped in a command-line script. The script exposes to the command line the parameters to ->new for both input and output objects through judicious use of GetOptions. I have used this script in many conversion tasks between many different formats.

... except now ...

I am having trouble with reading the flatfile format.

Happily, at first, I see that -display_id and -accession_number are both parameters to Bio::SeqIO::table->new. So they are naturally exposed to the command line as `in format=table header=1 display_id=1 seq=3"

Alas however -description is not a parameter to ->new.

The only way I can see to configure table.pm to take the sequence description (aka desc) from the 2nd column of my .tab file is as follows:

$in->attribute_map({-description => 2});

... however my trace shows me that even though this does work to set the desc attribute of the wrapped Bio::Primary_seq to the value from column 2, unfortunately using the attribute_map also removes the individual values passed in for -display_id and -accession_number

Ideally (I think) Bio::SeqIO::table->new would take a -description=2 instead of having to call attribute_map.

Or, Bio::SeqIO::table->new would take -attribute_map and even accept it as a string which gets evaluated to a hash reference, just as I see -colnames can be passed as a string evaling to an array (which I see in the unit test: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.924/t/SeqIO/table.t). This would allow the hash to be supplied at the command line.

Or, am I missing something?

FWIW: I am trying to help a lab convert a few years of plasmids from DNAPlasmid to Genbank (for load into Vector NTI) and I am passing through Bio::SeqiO::table in-so-diong.....

Cheers, and Thanks for help and suggestions....

Malcolm Cook
Stowers Institute for Medical Research
1000 E 50th Street
Kanas City, MO 64110
(816) 926-4449
m...@stowers.org


_______________________________________________
Bioperl-l mailing list
Biop...@mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

Brian Osborne

unread,
Sep 12, 2015, 12:46:46 PM9/12/15
to Cook, Malcolm, hl...@gmx.net, biop...@mailman.open-bio.org
Malcolm,

Can you attach a file with “description” that I can use to test a fix?

Thanks again,

Brian O

Fields, Christopher J

unread,
Sep 13, 2015, 12:26:39 AM9/13/15
to Cook, Malcolm, Hilmar Lapp, biop...@mailman.open-bio.org
Hi Malcolm,

Best thing would be to have a dummy example for expected input and output so it can be tested against, just to make sure things work as expected. Could you supply that? Certainly seems like it should be feasible.

chris

Cook, Malcolm

unread,
Sep 14, 2015, 11:00:55 AM9/14/15
to Fields, Christopher J, Brian Osborne, Hilmar Lapp, biop...@mailman.open-bio.org
Hi Chris, Brian, Hillmar, et. al.,

Thanks for offering to consider this change.

Attached is a test.tab and converted test.tab.gb

test.tab has three columns, n (display_id) d (definition/description) s (sequence)

test.tab.gb has what I would hope would result from writing in genbank format after reading using:

Bio::SeqIO->new(-file => $filename, -format => 'table'. -header=1, -display_id=1 ,-accession_number=1, -seq=3, -desc=2)


You may be additionally interested in the following:
After preparing this data, I tried to round-trip it, and found the following error when trying to convert test.tab.gb back to table format:

perl -M'Bio::SeqIO' -e '$out = Bio::SeqIO->new(-format => qq{table}); $in = Bio::SeqIO->new(-format => qq{genbank},-file=>"test.tab.gb"); while ( my $seq = $in->next_seq() ) {$out->write_seq($seq) }' > test.tab.gb.tab

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Sorry, you cannot write to a generic Bio::SeqIO object.
STACK: Error::throw
STACK: Bio::Root::Root::throw /n/local/stage/perlbrew/perlbrew-0.43/perls/perl-5.16.1t/lib/site_perl/5.16.1/Bio/Root/Root.pm:486
STACK: Bio::SeqIO::write_seq /n/local/stage/perlbrew/perlbrew-0.43/perls/perl-5.16.1t/lib/site_perl/5.16.1/Bio/SeqIO.pm:540
STACK: -e:1

Any help much appreciated. I do have a workaround for now, but it is a kludge....

Cheers,

Malcolm
test.tab.gb
test.tab

Brian Osborne

unread,
Sep 21, 2015, 9:22:03 AM9/21/15
to Cook, Malcolm, Fields, Christopher J, Hilmar Lapp, biop...@mailman.open-bio.org
Working on this ….

Brian Osborne

unread,
Sep 21, 2015, 10:36:32 AM9/21/15
to Cook, Malcolm, Fields, Christopher J, Hilmar Lapp, biop...@mailman.open-bio.org
Malcolm,

Done, in master. That “can not write to a generic …” error was due to the fact that write_seq() is not implemented for SeqIO::table, but someone forgot to put an “empty” write_seq() method into the module to catch any attempts. Fixed.

Brian O.

Fields, Christopher J

unread,
Sep 21, 2015, 11:05:27 AM9/21/15
to Brian Osborne, Hilmar Lapp, Cook, Malcolm, biop...@mailman.open-bio.org
Awesome, thanks Brian!

chris

Mark A Jensen

unread,
Sep 21, 2015, 11:36:03 AM9/21/15
to Fields, Christopher J, Cook, Malcolm, Hilmar Lapp, Brian Osborne, biop...@mailman.open-bio.org

+1!

Reply all
Reply to author
Forward
0 new messages