AI-Assisted Novel Catalyst Discovery
Traditional heterogeneous catalyst discovery has taken decades due to the complexities of solid surfaces and gas-solid interactions under the reaction conditions that govern the catalyst activity, selectivity, and stability. Artificial intelligence (AI) or machine learning (ML) has the potential to model these complex systems and guide researchers in developing novel catalysts for industrial applications and gaining scientific insights. The data for developing ML models can be either computational or experimental. The density functional theory (DFT) calculations have generated computational datasets for single-component or single-atom catalysts with reasonable accuracy due to the ease of computation. These single-component catalysts have only one type of material, e.g., oxides, perovskite, nitride, etc. However, real-world catalysts are multi-atom and multi-component, and therefore computational costs to generate accurate datasets are currently prohibitively high.
In this context, experimental data generated in the laboratory or mined from published literature can be used as ML datasets that contain real-world catalysts’ inherent complexities. SAGE Center has developed high-throughput and parallel reactor systems capable of simultaneously testing up to 16 catalysts to generate high-throughput experimental datasets. We also employ the literature data to train ML. Due to the heterogeneity of these literature datasets, the ML models trained from the literature datasets may not be able to accurately predict novel catalysts. Hence, we develop a framework of active learning that utilizes the ML model trained on literature data to guide/predict potential catalysts to be synthesized and tested. These potential catalysts range in various activities, which can inform us and the model about the causes of increased or decreased reaction efficiency. Then, with active learning, we can feed those catalyst test results back to retrain the ML model, and the retained ML model suggests other potential catalysts for us to synthesize and test again. Likewise, we repeat this process until catalyst performance goals are met. SAGE Center has developed novel heterogeneous catalysts with the aid of high-throughput datasets and literature data that exceed the performance of the vast majority of state-of-the-art catalysts for ammonia synthesis and decomposition.
Selected Publications
- Experimental Discovery of Novel Ammonia Synthesis Catalysts via Active Learning
- Enabling Catalyst Discovery through Machine Learning and High-Throughput Experimentation
Interpretable ML Models for Biomass Torrefaction
Torrefaction is one of the treatment processes to convert/upgrade raw biomass to high-quality solid fuels. However, investigation and interpretation of this process on highly dimensional, non-linear relationships as large datasets are limited & complicated. Our group combines machine learning (ML) with collaborative game theory (Shapley additive explanation, SHAP) to develop an interpretable model to predict solid yields and higher heating values of torrefied biomass, using independent input features/descriptors from reaction conditions, feedstock characteristics, and reactor properties. With the SHAP, we propose a new framework to interpret/explain the ML model performance and highlight the highly influential features of the system of biomass torrefaction from both local and global points of view. Interactions for any pair of features on the ML model can also be achieved. This application of ML with SHAP is a useful tool for researchers on biomass conversion.
Selected Publication