Request to fix Genomic coordinates error in Uniprot protein API

27 views
Skip to first unread message

Wang Juexin

unread,
Feb 14, 2019, 9:50:35 AM2/14/19
to ebi-proteins-api
Hi API developers,

  I found an interesting inconsistency with ensembl API and Uniprot API, and forward to Ensembl Groups
     
    When I query ensembl API use GRCh38:
    https://rest.ensembl.org/map/translation/ENSP00000493543/100..100?
    content-type=application/json
    I got:

    {"mappings":[{"gap":0,"seq_region_name":"7","rank":0,"start":140834813,"end":140834815,"coord_system":"chromosome","assembly_name":"GRCh38","strand":-1}]}

     
    Then I try GRCh37, they are same:
    https://grch37.rest.ensembl.org/map/translation/ENSP00000288602/100..100?
    content-type=application/json
    I got:

    {"mappings":[{"coord_system":"chromosome","strand":-1,"rank":0,"end":140534615,"gap":0,"start":140534613,"seq_region_name":"7","assembly_name":"GRCh37"}]}

    However, when I jump to uniprot protein api
    https://www.ebi.ac.uk/proteins/api/coordinates/location/P15056:100
    I got:
     
    {
      "locations": [
        {
          "accession": "P15056",
          "taxid": 9606,
          "chromosome": "7",
          "ensemblTranslationId": "ENSP00000493543",
          "proteinStart": 100,
          "geneStart": 140834550,
          "proteinEnd": 100,
          "geneEnd": 140834548
        }
      ]
    }

Here are replies from Ensembl group,
<URL: https://helpdesk.ebi.ac.uk/Ticket/Display.html?id=324959 >

Hello

We believe that Uniprot may be mishandling strand in this case. The transcript
you're looking at is on the reverse strand of the genome. Position 100 of this
protein is mapped to Exon 3 of this transcript:
http://www.ensembl.org/Homo_sapiens/Transcript/Exons?g=ENSG00000157764;r=
7:140730665-140924928;t=ENST00000646891
genomic coordinates 140,834,872-140,834,609

GCCTATGAAGAATACACCAGCAAGCTAGATGCACTCCAACAAAGAGAACAACAGTTATTG
GAATCTCTGGGGAACGGAACTGATTTTTCTGTTTCTAGCTCTGCATCAATGGATACCGTT
ACATCTTCTTCCTCTTCTAGCCTTTCAGTGCTACCTTCATCTCTTTCAGTTTTTCAAAAT
CCCACAGATGTGGCACGGAGCAACCCCAAGTCACCACAAAAACCTATCGTTAGAGTCTTC
CTGCCCAACAAACAGAGGACAGTG

I've highlighted the relevant codon in red. Each line is 60 bases long, so
that codon is bases 58-60 of that exon. Since the exon is reverse stranded,
this means it's genomic coordinates are the start of the exon (872) minus 58
or 60 + 1, ie 813-815.

UniProt seem to have counted from the other end (140,834,609), minus 58 or 60
+ 1, giving 548-550. These coordinates are found in intron 3-4 and cannot be
the amino acid coordinates.

I suggest contacting UniProt, as it seems likely that this error is common to
all reverse strand mapped proteins.

All the best

Emily
Ensembl helpdesk

I think Ensembl is correct, could you please check and fix it, I need your support.

Thanks,
Juexin
Reply all
Reply to author
Forward
0 new messages