Over 90% of NIST TECR recuration done! Outlook to next phases

17 views
Skip to first unread message

Robert Giessmann

unread,
Feb 16, 2024, 11:29:27 AM2/16/24
to openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions, Robert Giessmann, Elad Noor, nathan fargier, Frank Russo, Maren Philipps, Nguyen, Scott, maxime...@inria.fr, Melanie Stefan, Andrew Freiburger, Jesintha Maniraja, jordan...@gmail.com, Teddy Groves, pozr...@gmail.com
Dear all,

We hit the 90% mark of curated pages! Congratulations to everyone involved -- you did a tremendous job! :)

So far, the team of 13 curators looked at 251 pages of PDF (only 19 left to go!), filled with >1500 tables, curating >4900 individual data points, consisting of ~ 3 numbers each. Wow! We invested close to 60 hours in this phase already. Was it worth the effort? I think: yes!

We identified >40 tables which are either not in NIST's online database altogether, or hardly accessible, or are "different". It's really good that we are doing this comparison!


Will we stop here? Well, we could. The data will be in pretty good shape then, already. But I think it would be great if we could take it further. So, what would that be?

A chunk of work (again very little time needed to work on just one page), currently an estimated 8 hours to go, is to copy the comments below the tables into the corresponding sheet. Helping hands wanted!!

There will also be a step to compare the table "metadata" (evaluation, buffer, pH, reaction description, ...) from a structured form to the PDF -- but I have to set this up fully, still.


I hope you will stay on board! Please, always let me know, if you have any feedback, ideas and comments. You can write a public email or a private one, or whatever you like. :)

Thanks again for all the work done, and for the one to come. You are the best!

Cheers,
   Robert

Ronan M.T. Fleming

unread,
Feb 19, 2024, 9:44:02 AM2/19/24
to Robert Giessmann, openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions, Elad Noor, nathan fargier, Frank Russo, Maren Philipps, Nguyen, Scott, maxime...@inria.fr, Melanie Stefan, Andrew Freiburger, Jesintha Maniraja, jordan...@gmail.com, Teddy Groves, pozr...@gmail.com
Hi Robert et al,

thanks for continuing to collect this information in an organized fashion. I am sorry if I have missed it, but in this effort, are the MDL mol files collected for each of the metabolites in the reaction database?

We've been collecting such information for metabolites in https://www.vmh.life/ (see https://github.com/opencobra/ctf) and are currently working with Marco Pagni in https://www.recon4imd.org/ to compare and refine that information with mol files collected from other resources. Would a subset of that effort be a contribution to openTECR?

Regards,

Ronan




--
You received this message because you are subscribed to the Google Groups "openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentecr+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opentecr/CABfTevwXDe6aAUrq85%3DxSp9RWYrUYAea6R6TFHwvxpOJy34d_g%40mail.gmail.com.


--
--
Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.
----------------------------------------------------------------------------
Associate Professor,
School of Medicine,
University of Galway,
Ireland.
Coordinator of the Horizon Europe project "Reconstruction and Computational Modelling for Inherited Metabolic Diseases" (Recon4IMD)
----------------------------------------------------------------------------
Peer-reviewed publications: https://goo.gl/FZPG23
Mobile:  +353 852 109 806
Github: @rmtfleming
----------------------------------------------------------------------------

Robert Giessmann

unread,
Feb 19, 2024, 9:54:19 AM2/19/24
to Ronan M.T. Fleming, openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions
Hi Ronan,

Thanks for posting on this! I personally thought we would map to other databases, and not collect the structure representations ourselves, but the topic is definitely related.

Do you have plans to link e.g. to Rhea? Being a database about reactions, I felt that would be the most worthwhile resource for us to go to...

What do you think?

Best,
   Robert

Elad Noor

unread,
Feb 19, 2024, 10:31:49 AM2/19/24
to openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions
Hi,

I just wanted to add that there is a list of CAS Registry numbers for all the names that appear in Goldbergs original papers, and I hope that we'll collect them and associate them with the Keq values as well.
I'm not sure if there is an advantage for us to also store the MDL mol files, but it would be great if we could cross-link to your database (and to MetaNetX) using the CAS numbers.

Best,
Elad

Ronan M.T. Fleming

unread,
Feb 19, 2024, 10:38:27 AM2/19/24
to Elad Noor, openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions
Hi Elad, Robert, et al.

Rhea & SwissLipids are also partners in the project https://www.recon4imd.org/reconstruction-of-human-metabolic-networks so once our current reconciliation effort is complete for openTECR relevant metabolites, the consortium should be able to provide cross references to vmh, metanetx, rhea and swisslipids. 

Where does one find the list of openTECR relevant metabolites with existing annotations?

I suggest to include an InChI for each metabolite, rather than an MDL mol file, to make it unambiguous what molecule is meant. In the case that the metabolite is actually a pseudoisomeric group, then perhaps the neutral InChI might also be useful.

Elad, 
I gather that Sebastian, working with Marco, has been in contact with yee with a view to migrating to an open source solution to estimate most predominant charged form. Any news on that?

Regards,

Ronan

Elad Noor

unread,
Feb 19, 2024, 10:51:13 AM2/19/24
to openTECR: Open Database of Thermodynamics of Enzyme-Catalyzed Reactions
Robert collected the 7 papers as PDF files on Zenodo. The tables of names and CAS numbers are usually in the end of each PDF, before the reference list.
However, it is not digitized so I hope standard OCR can read it well enough to get all the numbers easily.
I tried anecdotally to copy a few numbers and search on PubChem, and I got the correct compounds.

I agree, we should all just use InChIs to index the data, and to cross-reference. We already do that in eQuilibrator actually.
I think pseudoisomeric groups and charge are not an issue for this database, because all equilibrium constants are apparent (i.e. measured for the ensemble and not for a specific protonation state).

We did meet Sebastian recently to talk about the open source tools. I think that unfortunately we don't have a really good solution yet, but we said we will check out pkasolver. I didn't follow up on that though.

Best,
Elad
Reply all
Reply to author
Forward
0 new messages