Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Bio-srs] Uniprot and EMBL question

10 views
Skip to first unread message

Iain Wallace

unread,
Oct 10, 2005, 8:57:15 AM10/10/05
to bio...@magpie.bio.indiana.edu
Hi,

I am trying to return the embl entries for a list of uniprot entries.
I use the following command.
getz '(@testing > embl)'
where the file testing contains:
uniprot:CYGB_MOUSE
uniprot:GLB1_SCAIN

The output is:
EMBL:AK019410
EMBL:MMU315163
EMBL:BC055040

Is there any way of viewing the Uniprot ID's aswell as the EMBL ID;
My ideal output would be
EMBL:AK019410 UNIPROT:CYGB_MOUSE
EMBL:MMU315163 UNIPROT:CYGB_MOUSE
EMBL:BC055040 UNIPROT:CYGB_MOUSE

I have tried getz '(@testing > embl) > uniprot'
but this only returns one entry, rather than three..

I want to parse out the results into individual files according to the
uniprot id.

I believe it is possible using views and wgetz, but I would prefer not
to use wgetz

Any help would be greatly appreciated.

Iain

Hamish McWilliam

unread,
Oct 10, 2005, 10:38:59 AM10/10/05
to bionet-so...@moderators.isc.org
Hi Iain,

A simple solution is to use a shell script to do the relevant
processing. For example:

#!/bin/sh
tab=`echo "\t"`
for ln in `cat testing`; do
getz "[$ln]>embl" | sed "s#\$#$tab$ln#"
done

This produces your desired result, but is inefficent for large lists of
ids since each id is processed using an individual getz call.

If your set of ids is the product of a query you could use an Icarus
script to do the processing instead, and avoid some of the overhead
involved in the getz calls.

Hamish
--
============================================================
Mr Hamish McWilliam
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD, UK

URL: http://www.ebi.ac.uk/
============================================================

0 new messages