Summary
Focus: Small organic molecules (useful and versatile)
Challenge: Chemical space is vast
Molecular discovery: complex multi-objective optimization
Typically driven by human intuition
Tasks:
Predicting chemical properties, including reactivity
Ideating new molecular structures
Balancing objectives
Research threads:
AI for synthetic organic chemistry, medicinal chemistry, analytical chemistry
Foundational capabilities: chemistry-tailored neural nets, data sharing, autonomous chemistry labs
History of key chemical tasks
Computer-aided retrosynthesis: Compute programs that explore recipes from an expert-encoded rules/heuristics/constraints
Explain reactivity trends: data-driven analysis of relation between physical conditions and experimental outcomes
Predicting spectra (how molecules look to sensors): rule-based analyses
Synthesis planning: how we access (new) molecules
Input: product to synthesize
Output: reactants, intermediaries, conditions
Typical approach: start with product and try to reverse it until we get to chemicals that we can purchase
Use libraries of valid chemical transformation rules
Produced by chemical vendors
Expert encoded rules in software (https://www.synthiaonline.com/)
Generative models that hypothesize possible transformations
Mine databases of historical reactions
Critical to create a canonical representation of molecules and reactions
Strings (e.g. SMILES)
Structural “fingerprints”
Descriptions of constituent molecules
Graphs & graph edits (requires atom mapping)
Condensed graph of reaction (requires atom mapping)
Synthesis constrains the space of chemicals, transformations and all influences we can access easily
Every transformation requires environmental conditions (solvents, additives, concentrations, temperature, reaction time, etc.)
Different approaches focus on different levels of detail
Approach:
Learning transformations rules for reactions
From databases of known reactions
Representation: graphs of atoms + covalent bonds
Graph neural networks: learn about the behavior of each atom based on its connective structure
Limitations
No 3D structure (not that important for small molecules)
Ignore chirality of molecules
Some covalent bond details are not represented
Ignore interactions beyond covalent bonds
Ignore Atropisomerism
Database Learning process
Find core transformation from database
Add related neighbor reactions
Use set of reactions to drive a retrosynthesis search
Neural net process
Train model to convert from products to reactants
SMILES->SMILES
Graph->Graph
Graph->SMILES
Apply repeatedly within a retrosynthesis search loop
Many algorithms to search very large space
Monte Carlo tree search, best-first, etc.
RL is usable but challenging because the space of moves is dynamic
Search is vulnerable to hallucination where predicted transformations are wrong and lead it down wrong paths
Simulations can be used to check these rules but they are not yet ready to be used reliably (hard to set up, computationally expensive, inaccurate)
Reaction condition recommendation as fill-in-the-blank
Embedding model: learns vector embedding of reagents based on their function. Groups them in ways that mimic structural relationships.
ASKCOS: https://askcos.mit.edu/
Suite of synthesis planning modules
Chemoinformatics & ML
Tasks: Retrosynthesis, condition recommendation, reaction product prediction, reaction classification, atom mapping, selectivity prediction, solvation prediction
35k users, 15 companies
Challenges:
Complexity of synthetic targets is changing: more complex molecules and synthesis pathways are needed for modern use-cases
Data-driven search programs generate many pathway ideas but what do we do experimentally?
Need to score more promising options
E.g. feasibility, impurity, greenness, yield, flow compatibility, scalability, cost
The source of data from which reactions are sourced affects ability to evaluate reactivity
Diverse in substrates/reactions but not conditions: papers, patents
Diverse in conditions but not in substrates/reactions: high-throughput experimentation
Need techniques that cover both
Open Reaction Database: encourages data sharing across teams
https://open-reaction-database.org/
Database
Data structure
Different organization approach for community
Summary:
AI/ML has broad relevance for chemistry
Old goals, ongoing new approaches
ASKCOS suite of tools
Overall, great opportunity for supervised learning