"Unknown" Datanode @Type annotations

1 view
Skip to first unread message

Egon Willighagen

unread,
May 16, 2020, 7:36:41 AM5/16/20
to wikipathways-discuss

Hi curators, and other interested people,

in the last WikiPathways telcon we were discussing a recent observation that some <DataNode> elements in the GPML source files do not have an @Type attribute. Martina checked the GPML specification and reported shortly after that "Unknown" is the default.

There is a curation event organized at the end of May around this theme.

That said, there are still over 5000 DataNodes of which the type is "Unknown", something that we can curate. That number came from a SPARQL query from our nightly curation reports, and I have just split our three curation query variants:

1. all "Unknown"-typed data nodes for all except from the Reactome collection
2. all "Unknown"-typed data nodes for only the Reactome collection
3. all "Unknown"-typed data nodes with a specific data source for all except from the Reactome collection

The second set it related to the Reactome Convertor and how some things are converted into GPML. However, I did note too there are unknown-typed datanotes with ChEBI identifiers. Something that may be worthwhile checking out.

The first and third sets are starting points for curation. In the third set, I limit the output to these sources: Wikidata, ChEBI, Uniprot-TrEMBL, and Ensembl. It has been suggested that some annotation we can do in an automated way, which may be feasible for nodes with the latter two data sources. For Wikidata and ChEBI it is less straightforward, and I would recommend manual curation for these.

I would suggest people to record DataNodes of a type that we currently do not have. Are there some types used relatively frequently but for which we currently do not have a type (current types: Metabolite, Protein, GeneProduct, Rna, Complex). We already identified "Dna" as missing, but there may be others.


Grtz,

Egon

--
Hi, do you like citation networks? Already 51% of all citations are available available for innovative new uses. Join me in asking the American Chemical Society to join the Initiative for Open Citations too. SpringerNature, the RSC and many others already did.

-----
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: https://www.zotero.org/egonw
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen
Reply all
Reply to author
Forward
0 new messages