Artificial intelligence for synthetic organic and analytical chemistry | 9am PT Tues, Apr 30, 2024

7 views
Skip to first unread message

Grigory Bronevetsky

unread,
Apr 25, 2024, 2:17:05 PMApr 25
to ta...@modelingtalks.org

image.pngModeling Talks

Artificial intelligence for synthetic organic and analytical chemistry

Tues, Apr 30 | 9:00 am PT

Meet | Youtube Stream


Hi all,


The presentation will be via Meet and all questions will be addressed there. If you cannot attend live, the event will be recorded and can be found afterward at
https://sites.google.com/modelingtalks.org/entry/artificial-intelligence-for-synthetic-organic-and-analytical-chemistry


Abstract:
Artificial intelligence and machine learning have become important components of the computational toolbox that can be used to advance chemical research and discovery. In this talk, I will discuss our group’s work advancing AI/ML as it applies to the broad subfields of synthetic organic chemistry and analytical chemistry. I will describe several approaches to facilitate decision-making during synthesis planning and reaction development, including the long-standing task of computer-aided retrosynthetic analysis. Though most research in “predictive chemistry” focuses on applying known reactivity to new substrates, ongoing work has also started to show promise for reaction discovery. I will also describe our recent work in analytical chemistry, specifically using tandem mass spectrometry data for structure elucidation of unknown small molecule metabolites. A pervasive theme of our research is the use of domain expertise to inform modeling, from formulating chemistry challenges as statistical learning problems to designing new neural network architectures uniquely suited to chemistry data.

 

Bio:
Connor W. Coley is the Class of 1957 Career Development Professor and an Assistant Professor at MIT in the Department of Chemical Engineering and the Department of Electrical Engineering and Computer Science. He received his B.S. and Ph.D. in Chemical Engineering from Caltech and MIT, respectively, and did his postdoctoral training at the Broad Institute. His research group at MIT works at the interface of chemistry and data science to develop models that understand how molecules behave, interact, and react and use that knowledge to engineer new ones, with an emphasis on therapeutic discovery. Connor is a recipient of C&EN’s “Talented Twelve” award, Forbes Magazine’s “30 Under 30” for Healthcare, Technology Review’s 35 Innovators Under 35, the NSF CAREER award, the ACS COMP OpenEye Outstanding Junior Faculty Award, the Bayer Early Excellence in Science Award, the 3M NTFA, and was named a Schmidt AI2050 Early Career Fellow and a 2023 Samsung AI Researcher of the Year.


More information on previous and future talks: https://sites.google.com/modelingtalks.org/entry/home

Grigory Bronevetsky

unread,
May 3, 2024, 1:53:32 PMMay 3
to Talks, Grigory Bronevetsky
Video Recording: https://youtu.be/m4ReQTke8NA

Summary

  • Focus: Small organic molecules (useful and versatile)

    • Challenge: Chemical space is vast

    • Molecular discovery: complex multi-objective optimization

      • Typically driven by human intuition

      • Tasks: 

        • Predicting chemical properties, including reactivity

        • Ideating new molecular structures

        • Balancing objectives

  • Research threads:

    • AI for synthetic organic chemistry, medicinal chemistry, analytical chemistry

    • Foundational capabilities: chemistry-tailored neural nets, data sharing, autonomous chemistry labs

  • History of key chemical tasks

    • Computer-aided retrosynthesis: Compute programs that explore recipes from an expert-encoded rules/heuristics/constraints

    • Explain reactivity trends: data-driven analysis of relation between physical conditions and experimental outcomes

    • Predicting spectra (how molecules look to sensors): rule-based analyses

  • Synthesis planning: how we access (new) molecules

    • Input: product to synthesize

    • Output: reactants, intermediaries, conditions

    • Typical approach: start with product and try to reverse it until we get to chemicals that we can purchase

    • Use libraries of valid chemical transformation rules

      • Produced by chemical vendors

      • Expert encoded rules in software (https://www.synthiaonline.com/)

      • Generative models that hypothesize possible transformations

        • Mine databases of historical reactions

        • Critical to create a canonical representation of molecules and reactions

          • Strings (e.g. SMILES)

          • Structural “fingerprints”

          • Descriptions of constituent molecules

          • Graphs & graph edits (requires atom mapping)

          • Condensed graph of reaction (requires atom mapping)

    • Synthesis constrains the space of chemicals, transformations and all influences we can access easily

    • Every transformation requires environmental conditions (solvents, additives, concentrations, temperature, reaction time, etc.)

      • Different approaches focus on different levels of detail

    • Approach:

      • Learning transformations rules for reactions

        • From databases of known reactions


      • Representation: graphs of atoms + covalent bonds

      • Graph neural networks: learn about the behavior of each atom based on its connective structure

        • Limitations

          • No 3D structure (not that important for small molecules)

          • Ignore chirality of molecules

          • Some covalent bond details are not represented

          • Ignore interactions beyond covalent bonds

          • Ignore Atropisomerism

      • Database Learning process

        • Find core transformation from database

        • Add related neighbor reactions

        • Use set of reactions to drive a retrosynthesis search

      • Neural net process

        • Train model to convert from products to reactants

          • SMILES->SMILES

          • Graph->Graph

          • Graph->SMILES

        • Apply repeatedly within a retrosynthesis search loop

          • Many algorithms to search very large space

          • Monte Carlo tree search, best-first, etc.

          • RL is usable but challenging because the space of moves is dynamic

          • Search is vulnerable to hallucination where predicted transformations are wrong and lead it down wrong paths

          • Simulations can be used to check these rules but they are not yet ready to be used reliably (hard to set up, computationally expensive, inaccurate)

      • Reaction condition recommendation as fill-in-the-blank

        • Embedding model: learns vector embedding of reagents based on their function. Groups them in ways that mimic structural relationships.

  • ASKCOS: https://askcos.mit.edu/

    • Suite of synthesis planning modules

    • Chemoinformatics & ML

    • Tasks: Retrosynthesis, condition recommendation, reaction product prediction, reaction classification, atom mapping, selectivity prediction, solvation prediction

    • 35k users, 15 companies

  • Challenges:

    • Complexity of synthetic targets is changing: more complex molecules and synthesis pathways are needed for modern use-cases

    • Data-driven search programs generate many pathway ideas but what do we do experimentally?

      • Need to score more promising options

      • E.g. feasibility, impurity, greenness, yield, flow compatibility, scalability, cost

      • The source of data from which reactions are sourced affects ability to evaluate reactivity

        • Diverse in substrates/reactions but not conditions: papers, patents

        • Diverse in conditions but not in substrates/reactions: high-throughput experimentation

        • Need techniques that cover both

  • Summary:

    • AI/ML has broad relevance for chemistry

    • Old goals, ongoing new approaches

    • ASKCOS suite of tools

    • Overall, great opportunity for supervised learning


Reply all
Reply to author
Forward
0 new messages