I am trying to return the embl entries for a list of uniprot entries.
I use the following command.
getz '(@testing > embl)'
where the file testing contains:
uniprot:CYGB_MOUSE
uniprot:GLB1_SCAIN
The output is:
EMBL:AK019410
EMBL:MMU315163
EMBL:BC055040
Is there any way of viewing the Uniprot ID's aswell as the EMBL ID;
My ideal output would be
EMBL:AK019410 UNIPROT:CYGB_MOUSE
EMBL:MMU315163 UNIPROT:CYGB_MOUSE
EMBL:BC055040 UNIPROT:CYGB_MOUSE
I have tried getz '(@testing > embl) > uniprot'
but this only returns one entry, rather than three..
I want to parse out the results into individual files according to the
uniprot id.
I believe it is possible using views and wgetz, but I would prefer not
to use wgetz
Any help would be greatly appreciated.
Iain
A simple solution is to use a shell script to do the relevant
processing. For example:
#!/bin/sh
tab=`echo "\t"`
for ln in `cat testing`; do
getz "[$ln]>embl" | sed "s#\$#$tab$ln#"
done
This produces your desired result, but is inefficent for large lists of
ids since each id is processed using an individual getz call.
If your set of ids is the product of a query you could use an Icarus
script to do the processing instead, and avoid some of the overhead
involved in the getz calls.
Hamish
--
============================================================
Mr Hamish McWilliam
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD, UK
URL: http://www.ebi.ac.uk/
============================================================