(Large) Language models for accelerated chemical discovery and synthesis | 9am PT Tues Oct 22, 2024

30 views
Skip to first unread message

Grigory Bronevetsky

unread,
Oct 18, 2024, 10:40:23 PM10/18/24
to ta...@modelingtalks.org

image.pngModeling Talks

(Large) Language models for accelerated chemical discovery and synthesis
image.png

Tues, Oct 22 | 9am PT

Meet | Youtube Stream


Hi all,


The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at

https://sites.google.com/modelingtalks.org/entry/large-language-models-for-accelerated-chemical-discovery-and-synthesis


More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home


Abstract:
AI-accelerated synthesis is an emerging field that uses machine learning algorithms to improve the efficiency and productivity of chemical and materials synthesis. Modern machine learning models, such as (large) language models, can capture the knowledge hidden in large chemical databases to rapidly design and discover new compounds, predict the outcome of reactions, and help optimize chemical reactions. One of the key advantages of AI-accelerated synthesis is its ability to make vast chemical data accessible and predict promising candidate synthesis paths, potentially leading to breakthrough discoveries. Overall, AI is poised to revolutionize the field of organic synthesis, enabling faster and more efficient drug development, catalysis, and other applications.


Bio:
Philippe Schwaller joined EPFL as a tenure-track assistant professor in the Institute of Chemical Sciences and Engineering in February 2022. He leads the Laboratory of Artificial Chemical Intelligence, which works on AI-accelerated discovery and synthesis of molecules. Philippe is a core PI of the NCCR Catalysis, a Swiss centre for sustainable chemistry research, education, and innovation, and a co-lead of the foundation models for sciences pillar in the Swiss AI initiative. He belongs to a new generation of scientists with a broad set of skills – in his case, a combination of chemistry, materials science, computer science, and experimental research.

Before EPFL, Philippe worked for five years at IBM Research. He simultaneously completed an MPhil in Physics (University of Cambridge) and a PhD in Chemistry and Molecular Sciences (University of Bern). He also holds a BSc and MSc degree in Materials Science and Engineering (EPFL).

Grigory Bronevetsky

unread,
Jul 10, 2025, 11:26:24 PMJul 10
to ta...@modelingtalks.org
Video Recording: https://www.youtube.com/live/WB0QcsIGC6g

Summary:
  • Focus: accelerating the molecule/materials design cycle

    • Design: what molecule to make?

    • How to make it

    • Test

  • Chemical data sources

    • Need: chemical reaction space (how to make molecules)

    • Published literature: extensive but not accessible

    • Experiment digital lab books

    • Simulations (highly usable but limited to the types of reactions each model can support)

    • Patents (valuable but has errors)

      • Daniel Lowe and Roger Sayle has text-mined reactions from patents

  • SMILES: linear representation of molecular graphs (tree hierarchy with back-edges to form cycles)

  • ML for reaction predictions:

  • Retrosynthesis:

    • Target molecule

    • Known/available building blocks

    • Design sequence of reactions to produce target molecule

    • Typically done by specifying reaction rules and searching over the space to reach the target

    • ML: RoboRXN

      • Multi-step synthesis planning

      • Molecular transformer for Retro and Forward steps

      • Transformer predicts entire recipe with all the actions (stir, filter, etc.) that one can give to a robotic platform

  • Use of general purpose LLMs to for chemical tasks (above was specialized models)

    • Moving from encoder-decoder to decoder-only GPT models

    • Many computational chemistry tools on github. Hard to set up and use

    • Aim: bridge the gap between computational and experimental chemistry

    • Generic LLMs are bad at chemistry; ChemCrow extends them using chemical tools

      • https://github.com/ur-whitelab/chemcrow-public

      • LLM uses existing specialized tools to solve chemical problems 

      • Example: automated synthesis

        • Plan and execute synthesis of an insect repellent

        • Find the chemical to synthesize

        • Generic name => SMILES => molecular graphs

        • Run reaction planner to get recipe

        • Execute recipe on robot

      • Example: molecular discovery

        • Given experimental data that describes molecule’s properties

        • Use ChemCrow to discover the molecule consistent with the data

      • Example: Safety tools

        • Interact with tool to ask the dangers from using various chemicals and likely outcomes of usage scenarios

  • Automates synthesis is not yet a solver problem

    • Supply chain/robotics challenges

    • Weak synthesis planning models 

    • Real organic molecules are much more complex than current planning tools are capable of

  • Bayesian optimization for reactions

    • Working to figure out the granularity of the way molecules are describes one-hot, DFT

    • BoChemian: LLM embeddings of the text that describes reaction procedures

    • https://neurips.cc/virtual/2023/78776

  • Generative De Novo Molecule design

    • Distribution learning (Transfer learning)

    • Goal directed Learning (Reinforcement Learning)

    • Generation using a high-fidelity oracle

      • Oracle: high fidelity/cost simulation 

      • Protein design algorithm can call oracle a limited number of times

      • Sample efficiency is critical: learn from few observations

      • Approaches:

        • Augmented memory: combines data augmentation with experience replay

        • Saturn: sample-efficient de novo design

  • Synthesizability constrained generation

    • TANGO: enforcing building blocks in synthesis routes

      • New reward function

        • Tanimoto similarity

        • Substructure match

    • Accelerates search for high-value molecules

  • FSscore: Chemist’s personalized feasibility score

    • Different chemists find it easier to synthesize different molecules

    • Can fine-tune model to align with chemist preferences

    • Can replace human expert with a make-on-demand molecule library

Reply all
Reply to author
Forward
0 new messages