Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Proposal to rename UniProt-Trembl to UniProtKB

7 views
Skip to first unread message

Egon Willighagen

unread,
May 8, 2019, 6:06:17 AM5/8/19
to wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

Hi all,

I was at the Swiss Institute of Bioinformatics, where a good part of the UniProt, NextProt, Rhea, SwissLipids, etc, work is happening.

I was asked about the use of UniProt-TrEMBL as data source name of our UniProt identifiers. They have been getting questions about it, asking to which resource it referred: the curated or uncurated team.

I explained what I remember of the discussions around this some (already) five~ish years ago, when we had a round of clean up. I remember things like: we want to give freedom to also allow use of identifiers of only translated proteins, and that using two DataSources was causing trouble with ID mapping.

The proposal they made was simple: use UniProtKB as data source label.

Are there any objections against that change? I think, if not mistaken, we can keep the SysCode identical (and link outs, etc), so that basically only the user visible part changes.

Comments, questions, thoughts, please. If I don' hear anything in the next week or so, I will update BridgeDb accoringly.

Egon

--
Hi, do you like citation networks? Already 51% of all citations are available available for innovative new uses. Join me in asking the American Chemical Society to join the Initiative for Open Citations too. SpringerNature, the RSC and many others already did.

-----
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: https://www.zotero.org/egonw
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen

Evelo, Chris (BIGCAT)

unread,
May 8, 2019, 6:56:50 AM5/8/19
to Egon Willighagen, wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

Hi Egon,

 

I am not sure I understand. trEMBL is a very clear indication (uncurated translated EMBL), the counterparts are SwissProt and PIR, both curated resources (EU and US, basically)

 

Together these three made up UniProt.

 

 

I am not sure what UniProtKB stands for, would that just replace trEMBL or would that be all of UniProt. At least in Open PHACTS we had some people (SwissProt employees mostly) that advocated using SwissProt only (I think they would have been fine with SwissProt and PIR, but the content of PIR not in SwissProt is limited). So I do think we want a mechanism to discriminate on origin. But maybe that is still possible with what you suggest?

 

    Best, Chris

Egon Willighagen

unread,
May 8, 2019, 6:59:53 AM5/8/19
to Evelo, Chris (BIGCAT), wikipathw...@googlegroups.com, bridgedb...@googlegroups.com
On Wed, May 8, 2019 at 12:56 PM Evelo, Chris (BIGCAT) <chris...@maastrichtuniversity.nl> wrote:

I am not sure what UniProtKB stands for


 

would that just replace trEMBL or would that be all of UniProt. At least in Open PHACTS we had some people (SwissProt employees mostly) that advocated using SwissProt only (I think they would have been fine with SwissProt and PIR, but the content of PIR not in SwissProt is limited). So I do think we want a mechanism to discriminate on origin. But maybe that is still possible with what you suggest?


I do not remember the exact details why we went for Uniprot-TrEMBL at the time, instead of UniProt-SwissProt, but I do remember the choice then was made based on arguments.

So, the UniProtKB is the collection of related projects, and I think covers what we want to say well.

Egon

Evelo, Chris (BIGCAT)

unread,
May 8, 2019, 7:46:07 AM5/8/19
to Egon Willighagen, wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

That is the thing right. UnitProtKB stands for both, so I can’t be sure what it stands for.

 

But maybe I do not understand what you suggest. Is that to use: UniProtKB/Swiss-Prot and UniProtKB/TrEMBL? In that case, yes, sure (except the slash might cause problems)

 

    Best, Chris

Egon Willighagen

unread,
May 8, 2019, 7:52:36 AM5/8/19
to Evelo, Chris (BIGCAT), wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

What they suggested is to use just "UniProtKB".

Egon

Evelo, Chris (BIGCAT)

unread,
May 8, 2019, 8:42:27 AM5/8/19
to Egon Willighagen, wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

I do not think that is a good idea since we really want to be able to select based on origin. But maybe I miss what the advantage is?

Egon Willighagen

unread,
May 8, 2019, 1:25:41 PM5/8/19
to Evelo, Chris (BIGCAT), wikipathw...@googlegroups.com, bridgedb...@googlegroups.com

Wait, how is UniProtKB not the source?

Alex Pico

unread,
May 8, 2019, 2:01:01 PM5/8/19
to wikipathw...@googlegroups.com, Evelo Chris (BIGCAT), bridgedb-discuss
I’m doubtful this is a simple change…. The name (not just the system code) are used in downstream tools, right? The name is in GMPLs.

What advantage comes from renaming?

 - Alex



-- 
You received this message because you are subscribed to the Google Groups "wikipathways-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikipathways-de...@googlegroups.com.
To post to this group, send email to wikipathw...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/wikipathways-devel/CAMPqvY_f_cbv6fLveNqUvP%3DAK98%2BqpeCqimZEoXyf%2B52CxtWSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

De Sl

unread,
May 8, 2019, 2:05:00 PM5/8/19
to WikiPathways Developer List List, Evelo Chris (BIGCAT), bridgedb...@googlegroups.com
Okay, I wasn't there when all of this was assigned back in the day. But from recent meetings with Uniprot people (at biocuration 2019 for example), and personal experience:

1. All of Uniprot swissprot and tremble is now part of Uniprotkb (they have a distinction mechanism, which is approved or unapproved, this info is queriable from their API). 
2. Uniprot Swissprot doesn't provide linkouts to other databases in BridgeDb. Therefore, I've been using tremble to annotate proteins with (even though Swissprot used to contain the higher quality protein annotations).
3. Users mix these databases up when annotating proteins, and we apparently also do not know exactly which one is better.

So, I believe having 1 Uniprot reference would solve 2 and 3. Point 1 could be considered a lense approach, were we could use the API of Uniprot to make correct distinctions (where ensembl maps to all Uniprot IDs).

Would my suggestion solve the problem mentioned by Chris?

Kind regards,

Denise


--

Egon Willighagen

unread,
May 8, 2019, 5:29:23 PM5/8/19
to bridgedb...@googlegroups.com, wikipathw...@googlegroups.com, Evelo Chris (BIGCAT)

So, what I understood is that they were getting questions from users why we (WikiPathways) were not using SwissProt... and only TrEMBL.

Egon

You received this message because you are subscribed to the Google Groups "bridgedb-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bridgedb-discu...@googlegroups.com.
To post to this group, send email to bridgedb...@googlegroups.com.
Visit this group at https://groups.google.com/group/bridgedb-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/bridgedb-discuss/AC77A90D-EEC8-46B7-BF7E-300080117483%40gladstone.ucsf.edu.

For more options, visit https://groups.google.com/d/optout.

Alex Pico

unread,
May 8, 2019, 6:32:40 PM5/8/19
to bridgedb-discuss, WikiPathways Developer List, Evelo Chris (BIGCAT)
Ok. And as to the feasibility of making such a change?  What happens to existing GPML? What happens in PV and other tools that use BridgeDb?  I could imagine adding a third entry (technically, though not a real proposal), but I’m not sure we can easily rename current ones. If someone else is sure, then it’s fine with me, but I’d want more details on how it would technically work.
 
 - Alex



Egon Willighagen

unread,
May 9, 2019, 1:24:14 AM5/9/19
to wikipathw...@googlegroups.com, bridgedb...@googlegroups.com, Chris Evelo
I'll work out a more detailed plan.

Good point about the existing files, but we can have a bit for that. I think last time, we did it manually.

Egon

Evelo, Chris (BIGCAT)

unread,
May 9, 2019, 4:25:21 AM5/9/19
to Egon Willighagen, bridgedb...@googlegroups.com, wikipathw...@googlegroups.com

As far as I know we are using both. If not then we somehow broke that at some stage. I am absolutely sure I have added Swissprot based IDs in the past. Also Christine advocated we use SwissProt everywhere in Open PHACTS and shw did look at SwissProt, WikiPathways connections so I find this surprising.

 

I think. We indeed should use both and it is important to be able which one comes from where. I always thought that that is the case and that you now suggested to merge them.

 

What might be the case is that we mean to be Swissprot is actually called UniProt while trEMBL is named explicitly. I agree that would not be most consistent, and might lead to people accidentally calling trEMBL IDs UniProt (which is technoically correct) while we wanted to use that only for the curated part.

Egon Willighagen

unread,
May 9, 2019, 5:59:33 AM5/9/19
to Evelo, Chris (BIGCAT), bridgedb...@googlegroups.com, wikipathw...@googlegroups.com
On Thu, May 9, 2019 at 10:25 AM Evelo, Chris (BIGCAT) <chris...@maastrichtuniversity.nl> wrote:

As far as I know we are using both.


Ah, now I get nature of this discussion. No, we're only using UniProt-TrEMBL.

That's was the discussion outcome I mentioned in my first email of probably something like 5 years ago. There were issues with data mapping not working, and it was then decided to move to only one of them, and to offer more freedom to the user (what I remembered, at least) and settled on TrEMBL instead of SwissProt, to allow people to refer to yet unproven proteins in their workflows, with WikiPathways as a collaboration platform.
 

If not then we somehow broke that at some stage.


So, yes, but I would not call it broken: it was discussed and there was a decision. So, not a bug, a feature, to use the catch phrase :)
 

I am absolutely sure I have added Swissprot based IDs in the past.


We did. And somewhere in the first or second year after I joined that group, it was harmonized.
 

Also Christine advocated we use SwissProt everywhere in Open PHACTS and shw did look at SwissProt, WikiPathways connections so I find this surprising.


This was *after* that discussion and harmonization.
 

I think. We indeed should use both and it is important to be able which one comes from where. I always thought that that is the case and that you now suggested to merge them.


We are currently not using both. Discussing if we should start using both again, is reopening that discussion (which I'm fine with, btw).

But I am not sure we need this, because we can simply use the UniPortKB to see if the protein is part of SwissPort or TrEMBL. That should be doable with a SPARQL query, I expect.

So, to summarize the current situation: we call everything TrEMBL (we actually have a unit test for that: [0]), to ensure the data analysis works (if I remember that past discussion correctly).

Egon


Egon Willighagen

unread,
Jun 2, 2019, 6:27:46 AM6/2/19
to wikipathw...@googlegroups.com, bridgedb...@googlegroups.com
Hi all,

as promised. 

On Thu, May 9, 2019 at 7:24 AM Egon Willighagen <egon.wil...@gmail.com> wrote:
I'll work out a more detailed plan.

Plan for transition to UniProtKB

1. update the datasources.txt (add entry, not replace)

 - Uniprot-TrEMBL, system code Spt (not used)
 - UniProtKB, system code S (previously Uniprot-TrEMBL)
 
 Except for the label and system code, all properties are identical.
 
 (Also update the other two files)

2. update and release BridgeDb Java 2.3.3
 
3. update PathVisio with BridgeDb Java 2.3.3
 
4. update BridgeDb webservices with BridgeDb Java 2.3.3

 - Everything with "Uniprot-TrEMBL" will have the same links out
 - Everything with "S" will have the same links out

5. update entries in GPML on wp.org

Looking forward to your feedback,

Egon

Reply all
Reply to author
Forward
0 new messages