Thankyou for visiting
nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Integrating such tools tends to occur within isolated environments, such as RXN for Chemistry18,24,46,47,48 and AIZynthFinder25,49,50, facilitated by corporate directives that promote integrability. Although most tools are developed by the open-source community or made accessible through application programming interfaces (APIs), their integration and interoperability pose considerable challenges for experimental chemists, mainly due to their lack of computational skill sets and the diversity of tools with steep learning curves, thereby preventing the full exploitation of their potential.
a, An overview of the task-solving process. Using a variety of chemistry-related packages and software, a set of tools is created. These tools and a user input are then given to an LLM. The LLM proceeds through an automatic, iterative chain-of-thought process, deciding on its path, choice of tools and inputs before coming to a final answer. The example shows the synthesis of DEET, a common insect repellent. b, Toolsets implemented in ChemCrow: reaction, molecule, safety, search and standard tools. Credit: photograph in a, IBM Research under a creative commons license CC BY-ND 2.0.
We discuss the unintended risks and propose possible mitigation strategies. Those can be achieved through foresight and safeguards, still promoting open and transparent science to enable broad oversight and feedback from the research community.
Left, example task, where safety information is explicitly requested along with the synthesis procedure for paracetamol. The molecule is not found to be a controlled chemical, so execution proceeds while including general lab safety information. Right, in cases where the input molecule is found to be a controlled chemical, execution stops, with a warning indicating that it is illegal and unethical to propose compounds with properties similar to a controlled chemical.
Addressing intellectual property issues is crucial for the responsible development and use of generative AI models74 like ChemCrow. Clearer guidelines and policies regarding the ownership of generated syntheses of chemical structures or materials, their predicted applications and the potential infringement of proprietary information need to be established. Collaboration with legal experts, as well as industry stakeholders, can help in navigating these complex issues and implementing appropriate measures to protect intellectual property.
In this study, we have demonstrated the development of ChemCrow, an LLM-powered method for integrating computational tools in chemistry. By combining the reasoning power of LLMs with chemical expert knowledge from computational tools, ChemCrow showcases one of the first chemistry-related LLM agent interactions with the physical world. ChemCrow has successfully planned and synthesized an insect repellent and three organocatalysts and guided the screening and synthesis of a chromophore with target properties. Furthermore, ChemCrow is capable of independently solving reasoning tasks in chemistry, ranging from simple drug-discovery loops to synthesis planning of substances across a wide range of molecular complexity, indicating its potential as a future chemical assistant la ChatGPT.
Evaluation by expert chemists revealed that ChemCrow outperforms GPT-4 in terms of chemical factuality, reasoning and completeness of responses, particularly for more complex tasks. Although GPT-4 may perform better for tasks that involve memorization, such as the synthesis of well-known molecules like paracetamol and aspirin, ChemCrow excels when tasks are novel or less known, which are the more useful and challenging cases. In contrast, LLM-powered evaluation tends to favour GPT-4, primarily due to the more fluent and complete-looking nature of its responses. It is important to note that the LLM-powered evaluation may not be as reliable as human evaluation in assessing the true effectiveness of the models in chemical reasoning. This discrepancy highlights the need for further refining evaluation methods to better capture the unique capabilities of systems like ChemCrow in solving complex, real-world chemistry problems.
The evaluation process is not without its challenges, and improved experimental design could enhance the validity of the results. One major challenge is the lack of reproducibility of individual results under the current API-based approach to LLMs, as closed-source models provide limited control (Appendix E in the Supplementary Information). Recent open-source models77,78,79 offer a potential solution to this issue, albeit with a possible trade-off in reasoning power. Additionally, implicit bias in task selection and the inherent limitations of testing chemical logic behind task solutions on a large scale present difficulties for evaluating ML systems. Despite these challenges, our results demonstrate the promising capabilities and potential of systems like ChemCrow to serve as valuable assistants in chemical laboratories and to address chemical tasks across diverse domains.
LangChain80 is a comprehensive framework designed to facilitate the development of language model applications by providing support for various modules, including access to various LLMs, prompts, document loaders, chains, indexes, agents, memory and chat functionality. With these modules, LangChain enables users to create various applications such as chatbots, question-answering systems, summarization tools and data-augmented generation systems. LangChain not only offers standard interfaces for these modules but also assists in integrating with external tools, experimenting with different prompts and models and evaluating the performance of generative models. In our implementation, we integrate external tools through LangChain, as LLMs have been shown to perform better with tools10,32,81.
The web search tool is designed to provide the language model with the ability to access relevant information from the web. Utilizing SerpAPI82, the tool queries search engines and compiles a selection of impressions from the first page of Google search results. This allows the model to collect current and relevant information across a broad range of scientific topics. A distinct characteristic of this instrument is its capacity to act as a launching pad when the model encounters a query it cannot tackle or is unsure of the suitable tool to apply. Integrating this tool enables the language model to efficiently expand its knowledge base, streamline the process of addressing common scientific challenges and verify the precision and dependability of the information it offers. By default, LitSearch is preferred by the agent over the WebSearch tool.
The literature-search tool focuses on extracting relevant information from scientific documents such as PDFs or text files (including raw HTML) to provide accurate and well-grounded answers to questions. This tool utilizes the paper-qa Python package ( -qa). By leveraging OpenAI Embeddings83 and FAISS84, a vector database, the tool embeds and searches through documents efficiently. A language model then aids in generating answers based on these embedded vectors.
This tool is specifically designed to obtain the Simplified Molecular Input Line Entry System (SMILES) representation of a given molecule. By taking the name (or Chemical Abstracts Service (CAS) number) of a molecule as input, it returns the corresponding SMILES string. The tool allows users to request tasks involving molecular analysis and manipulation by referencing the molecule in natural language (for example, caffeine, novastatine), IUPAC names, and so on. Our implementation queries chem-space85 as a primary source and upon failure queries PubChem86 and the IUPAC to SMILES converter OPSIN15 as a last option.
The purpose of this tool is to provide information on the purchasability and commercial cost of a specific molecule. By taking a molecule as input, it first utilizes molbloom87 to check whether the molecule is available for purchase (in ZINC20 (ref. 88)). Then, using the chem-space API85, it returns the cheapest price available on the market, enabling the LLM to make informed decisions about the affordability and availability of the queried molecule towards the resolution of a given task.
The tool is designed to determine the CAS number of a given molecule using various types of input references such as common names, IUPAC names or SMILES strings by querying the PubChem86 database. The CAS number serves as a precise and universally recognized chemical identifier, enabling researchers to access relevant data and resources with ease and ensuring that they obtain accurate and consistent information about the target molecule89.
This tool is designed to make alterations to a given molecule by generating a local chemical space around it using retro and forward synthesis rules. It employs the SynSpace package92, originally applied in counterfactual explanations for molecular machine learning93. The modification process utilizes 50 robust medicinal chemistry reactions94, and the retrosynthesis is performed either via PostEra Manifold18,95 (upon availability of an API key) or by reversing the 50 robust reactions. The purchasable building blocks come from the Purchasable Mcule supplier building block catalogues96, although customization options are available. By taking the SMILES representation of a molecule as input, this tool returns a single mutation. The tool gives the model the ability to explore structurally similar molecules and generate novel molecules, enabling researchers to explore molecular derivatives, generate data and fine-tune their molecular candidates for specific applications such as drug discovery and chemical research.
3a8082e126