Request to fix Genomic coordinates error in Uniprot protein API
27 views
Skip to first unread message
Wang Juexin
unread,
Feb 14, 2019, 9:50:35 AM2/14/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ebi-proteins-api
Hi API developers,
 I found an interesting inconsistency with ensembl API and Uniprot API, and forward to Ensembl Groups         When I query ensembl API use GRCh38:     https://rest.ensembl.org/map/translation/ENSP00000493543/100..100?    content-type=application/json    I got:
   {"mappings":[{"gap":0,"seq_region_name":"7","rank":0,"start":140834813,"end":140834815,"coord_system":"chromosome","assembly_name":"GRCh38","strand":-1}]}
   {"mappings":[{"coord_system":"chromosome","strand":-1,"rank":0,"end":140534615,"gap":0,"start":140534613,"seq_region_name":"7","assembly_name":"GRCh37"}]}
   However, when I jump to uniprot protein api     https://www.ebi.ac.uk/proteins/api/coordinates/location/P15056:100    I got:         {      "locations": [        {          "accession": "P15056",          "taxid": 9606,          "chromosome": "7",          "ensemblTranslationId": "ENSP00000493543",          "proteinStart": 100,          "geneStart": 140834550,          "proteinEnd": 100,          "geneEnd": 140834548        }      ]    }
We believe that Uniprot may be mishandling strand in this case. The transcript you're looking at is on the reverse strand of the genome. Position 100 of this protein is mapped to Exon 3 of this transcript: http://www.ensembl.org/Homo_sapiens/Transcript/Exons?g=ENSG00000157764;r= 7:140730665-140924928;t=ENST00000646891 genomic coordinates 140,834,872-140,834,609
I've highlighted the relevant codon in red. Each line is 60 bases long, so that codon is bases 58-60 of that exon. Since the exon is reverse stranded, this means it's genomic coordinates are the start of the exon (872) minus 58 or 60 + 1, ie 813-815.
UniProt seem to have counted from the other end (140,834,609), minus 58 or 60 + 1, giving 548-550. These coordinates are found in intron 3-4 and cannot be the amino acid coordinates.
I suggest contacting UniProt, as it seems likely that this error is common to all reverse strand mapped proteins.
All the best
Emily Ensembl helpdesk
I think Ensembl is correct, could you please check and fix it, I need your support.