Next generation pure component property estimation models: With and without machine learning techniques
Correction(s) for this article
-
Corrections to “Next generation pure component property estimation models: With and without machine learning techniques”
- Volume 69Issue 6AIChE Journal
- First Published online: March 27, 2023
Abdulelah S. Alshehri
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA
Department of Chemical Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
Contribution: Data curation (lead), Formal analysis (lead), Investigation (lead), Methodology (lead), Software (lead), Validation (lead), Visualization (equal), Writing - original draft (lead), Writing - review & editing (equal)
Search for more papers by this authorAnjan K. Tula
College of Control Science and Engineering, Zhejiang University, Hangzhou, China
Contribution: Data curation (equal), Formal analysis (equal), Investigation (equal)
Search for more papers by this authorCorresponding Author
Fengqi You
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA
Correspondence
Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.
Email: [email protected]
Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.
Email: [email protected]
Contribution: Conceptualization (equal), Funding acquisition (equal), Investigation (equal), Methodology (equal), Project administration (equal), Resources (equal), Supervision (equal), Visualization (equal), Writing - review & editing (equal)
Search for more papers by this authorCorresponding Author
Rafiqul Gani
Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod Denmark
Correspondence
Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.
Email: [email protected]
Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.
Email: [email protected]
Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal), Project administration (equal), Software (equal), Supervision (equal), Validation (equal), Writing - review & editing (equal)
Search for more papers by this authorAbdulelah S. Alshehri
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA
Department of Chemical Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
Contribution: Data curation (lead), Formal analysis (lead), Investigation (lead), Methodology (lead), Software (lead), Validation (lead), Visualization (equal), Writing - original draft (lead), Writing - review & editing (equal)
Search for more papers by this authorAnjan K. Tula
College of Control Science and Engineering, Zhejiang University, Hangzhou, China
Contribution: Data curation (equal), Formal analysis (equal), Investigation (equal)
Search for more papers by this authorCorresponding Author
Fengqi You
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA
Correspondence
Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.
Email: [email protected]
Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.
Email: [email protected]
Contribution: Conceptualization (equal), Funding acquisition (equal), Investigation (equal), Methodology (equal), Project administration (equal), Resources (equal), Supervision (equal), Visualization (equal), Writing - review & editing (equal)
Search for more papers by this authorCorresponding Author
Rafiqul Gani
Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod Denmark
Correspondence
Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.
Email: [email protected]
Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.
Email: [email protected]
Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal), Project administration (equal), Software (equal), Supervision (equal), Validation (equal), Writing - review & editing (equal)
Search for more papers by this authorAbstract
Physiochemical properties of pure components serve as the basis for the design and simulation of chemical products and processes. Models based on the molecular structural information of chemicals for the following 25 pure component properties are presented in this work: (critical-) temperature, pressure, volume, acentric factor; (normal-) boiling point, melting point, auto-ignition temperature; flash point; (standard-) enthalpy of formation, Gibbs energy of formation, enthalpy of fusion, enthalpy of vaporization, liquid molar volume; (environmental-) (lethal dose-) LC50 and LD50, photo-chemical oxidation potential, bioconcentration factor, permissible exposure limit; (physicochemical-) acid dissociation constant, water-solubility, octanol–water partition coefficient, Hildebrandt solubility parameter, Hansen solubility parameters. Utilizing functional groups for molecular representation, two parallel property estimation models where the group contributions for each property are regressed through traditional regression techniques and machine learning techniques are presented. Both techniques use an a priori data analysis before regression of model parameters. A dataset with more than 24,000 chemicals for the 25 pure component properties has been utilized for the development of the two sets of property models. The efficacy of the developed models and their use are highlighted together with a discussion on the overall performance, application range, and predictive capabilities with implications to product and/or process engineering problem solutions.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in GitHub at https://github.com/PEESEgroup/Pure-Component-Property-Estimation
Supporting Information
Filename | Description |
---|---|
aic17469-sup-0001-Supinfo.pdfPDF document, 3.6 MB | Appendix S1. Supporting Information. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1Poling BE, Prausnitz JM, O'Connell JP. The estimation of physical properties. The Properties of Gases and Liquids. Vol 1. 5th ed. McGraw-Hill Education; 2001: 11-102.
- 2Kontogeorgis GM, Gani R. Chapter 1 - introduction to computer aided property estimation. In: GM Kontogeorgis, R Gani, eds. Computer Aided Chemical Engineering. Vol 19. Elsevier; 2004: 3-26.
- 3Kim S, Chen J, Cheng T, et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 2021; 49(D1): D1388-D1395.
- 4Klein DJ. Topological indices and related descriptors in QSAR and QSPR. J Chem Inf Comput Sci. 2002; 42(6): 1507-1507.
- 5Marrero J, Gani R. Group-contribution based estimation of pure component properties. Fluid Phase Equilib. 2001; 183–184: 183-208.
- 6Karelson M, Lobanov VS, Katritzky AR. Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev. 1996; 96(3): 1027-1044.
- 7Marrero-Morejón J, Pardillo-Fontdevila E. Estimation of pure compound properties using group-interaction contributions. AIChE J. 1999; 45(3): 615-621.
- 8Xiong R, Sandler SI, Burnett R. An improvement to COSMO-SAC for predicting thermodynamic properties. Ind Eng Chem Res. 2014; 53(19): 8265-8278.
- 9Sham LJ, Schlüter M. Density-functional theory of the energy gap. Phys Rev Lett. 1983; 51(20):1888.
- 10Hansen K, Biegler F, Ramakrishnan R, et al. Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J Phys Chem Lett. 2015; 6(12): 2326-2331.
- 11Gani R, Pistikopoulos EN. Property modelling and simulation for product and process design. Fluid Phase Equilib. 2002; 194: 43-59.
- 12Alshehri AS, Gani R, You F. Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: state-of-the-art and future directions. Comput Chem Eng. 2020; 141:107005.
- 13O'Connell JP, Gani R, Mathias PM, Maurer G, Olson JD, Crafts PA. Thermodynamic property modeling for chemical process and product engineering: some perspectives. Ind Eng Chem Res. 2009; 48(10): 4619-4637.
- 14Yalamanchi KK, van Oudenhoven VCO, Tutino F, et al. Machine learning to predict standard enthalpy of formation of hydrocarbons. J Phys Chem A. 2019; 123(38): 8305-8313.
- 15Ritter ER. THERM: a computer code for estimating thermodynamic properties for species important to combustion and reaction modeling. J Chem Inf Comput Sci. 1991; 31(3): 400-408.
- 16Díaz-Tovar C-A, Gani R, Sarup B. Lipid technology: property prediction and process design/analysis in the edible oil and biodiesel industries. Fluid Phase Equilibr. 2011; 302(1–2): 284-293.
- 17Pacheco KA, Bresciani AE, Nascimento CA, Alves RM. Assessment of property estimation methods for the thermodynamics of carbon dioxide-based products. Energy Convers Manag. 2020; 211:112756.
- 18Hukkerikar AS, Kalakul S, Sarup B, Young DM, Sin G, Gani R. Estimation of environment-related properties of chemicals for design of sustainable processes: development of group-contribution+ (GC+) property models and uncertainty analysis. J Chem Inf Model. 2012; 52(11): 2823-2839.
- 19Könnecker G, Regelmann J, Belanger S, Gamon K, Sedlak R. Environmental properties and aquatic hazard assessment of anionic surfactants: physico-chemical, environmental fate and ecotoxicity properties. Ecotoxicol Environ Saf. 2011; 74(6): 1445-1460.
- 20Joback KG, Reid RC. Estimation of pure-component properties from group-contributions. J Chem Eng Commun. 1987; 57(1–6): 233-243.
- 21Klincewicz K, Reid R. Estimation of critical properties with group contribution methods. AIChE J. 1984; 30(1): 137-142.
- 22Constantinou L, Gani R. New group contribution method for estimating properties of pure compounds. AIChE J. 1994; 40(10): 1697-1710.
- 23Hukkerikar AS, Sarup B, Ten Kate A, Abildskov J, Sin G, Gani R. Group-contribution+ (GC+) based estimation of properties of pure components: improved property estimation and uncertainty analysis. Fluid Phase Equilib. 2012; 321: 25-43.
- 24Gani R. Group contribution-based property estimation methods: advances and perspectives. Cur Opinion Chem Eng. 2019; 23: 184-196.
- 25Fredenslund A, Gmehling J, Rasmussen P. Chapter 4 - the UNIFAC group-contribution method. In: A Fredenslund, J Gmehling, P Rasmussen, eds. Vapor-liquid Equilibria Using Unifac. Elsevier; 1977: 27-64.
- 26Alshehri AS, You F. Paradigm shift: the promise of deep learning in molecular systems engineering and design. Front Chem Eng. 2021; 3: 700717. doi:10.3389/fceng.2021.700717
- 27Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015; 349(6245): 255-260.
- 28Das L, Sivaram A, Venkatasubramanian V. Hidden representations in deep neural networks: part 2. Regression problems. Comput Chem Eng. 2020; 139:106895.
- 29Nagai R, Akashi R, Sugino O. Completing density functional theory by machine learning hidden messages from molecules. NPJ Comput Mater. 2020; 6(1): 1-8.
- 30Gomez-Bombarelli R, Wei JN, Duvenaud D, et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 2018; 4(2): 268-276.
- 31Goh GB, Siegel C, Vishnu A, Hodas NO, Baker N. Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. arXiv. 2017.
- 32Mann V, Venkatasubramanian V. Predicting chemical reaction outcomes: a grammar ontology-based transformer framework. AIChE J. 2021; 67(3):e17190.
- 33Mann V, Venkatasubramanian V. Retrosynthesis prediction using grammar-based neural machine translation: an information-theoretic approach. chemrxiv. 2021.
- 34Samudra AP, Sahinidis NV. Optimization-based framework for computer-aided molecular design. AIChE J. 2013; 59(10): 3686-3701.
- 35Marrero J, Gani R. Group-contribution-based estimation of octanol/water partition coefficient and aqueous solubility. Ind Eng Chem Res. 2002; 41(25): 6623-6633.
- 36Kolská Z, Růžička V, Gani R. Estimation of the enthalpy of vaporization and the entropy of vaporization for pure organic compounds at 298.15 K and at normal boiling temperature by a group contribution method. Ind Eng Chem Res. 2005; 44(22): 8436-8454.
- 37Modarresi H, Conte E, Abildskov J, Gani R, Crafts PJI. Model-based calculation of solid solubility for solvent selection: a review. Ind Eng Chem Res. 2008; 47(15): 5234-5242.
- 38Pistikopoulos EN, Barbosa-Povoa A, Lee JH, et al. Process systems engineering – the generation next? Comput Chem Eng. 2021; 147:107252.
- 39Zhou T, Jhamb S, Liang X, Sundmacher K, Gani R. Prediction of acid dissociation constants of organic compounds using group contribution methods. Chem Eng Sci. 2018; 183: 95-105.
- 40Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004; 14(3): 199-222.
- 41Harris D, Burges CJC, Linda K, Alex JS, Vladimir V. Support vector regression machines. Adv Neural Inf Process Syst. 1997; 9: 155-161.
- 42Chen Y, Lin Z, Zhao X, Wang G, Gu Y. Deep learning-based classification of hyperspectral data. IEEE J Select Topics Appl Earth Observ Remote Sens. 2014; 7(6): 2094-2107.
- 43Rasmussen CE. Gaussian processes in machine learning. Paper Presented at: Summer School on Machine Learning; Tübingen, Germany 2003.
- 44Quiñonero-Candela J, Rasmussen CE. A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res. 2005; 6(Dec): 1939-1959.
- 45Rasmussen CE, Nickisch H. Gaussian processes for machine learning (GPML) toolbox. J Mach Learn Res. 2010; 11: 3011-3015.
- 46Caywood MS, Roberts DM, Colombe JB, Greenwald HS, Weiland MZ. Gaussian process regression for predictive but interpretable machine learning models: an example of predicting mental workload across tasks. Front Hum Neurosci. 2017; 10: 647.
- 47Shi JQ, Choi T. Gaussian Process Regression Analysis for Functional Data. CRC Press; 2011.
- 48Murphy KP. Machine Learning: a Probabilistic Perspective. MIT Press; 2012.
- 49Schulz E, Speekenbrink M, Krause A. A tutorial on Gaussian process regression: modelling, exploring, and exploiting functions. J Math Psychol. 2018; 85: 1-16.
- 50Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model. 1988; 28(1): 31-36.
- 51Tula AK, Eden MR, Gani R. Component based development of computer-aided tools for different applications. Computer Aided Chemical Engineering - 29th European Symposium on Computer Aided Process Engineering. Vol 46. Elsevier; 2019: 91-96.
- 52Benhenda M. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv. 2017.
- 53Bengio Y, Grandvalet Y. No unbiased estimator of the variance of k-fold cross-validation. J Mach Learn Res. 2004; 5(Sep): 1089-1105.
- 54Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011; 12: 2825-2830.
- 55Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011; 2(3): 1-27.
- 56Alshehri AS, Tula AK, Zhang L, Gani R, You F. A Platform of Machine Learning-Based Next-Generation Property Estimation Methods for CAMD. Comput Aided Chem Eng. 2021; 50: 227–233. https://doi.org/10.1016/B978-0-323-88506-5.50037-1
- 57Hildebrand JH, Scott RL. The Solubility of Nonelectrolytes. Vol 17. 3rd ed. Dover Publications; 1964.
- 58Horvath AL. Chapter 3 - relationships between structure and properties. In: AL Horvath, ed. Molecular Design Chemical Structure Generation from the Properties of Pure Organic Compounds. Vol 75. Elsevier; 1992: 575-860.
- 59Hekayati J, Raeissi S. Estimation of the critical properties of compounds using volume-based thermodynamics. AIChE J. 2020; 66(11):e17004.
- 60Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521: 436-444.
- 61Sun Y, Sahinidis NV. A new functional group selection method for group contribution models and its application in the design of electronics cooling fluids. Ind Eng Chem Res. 2021; 60(19): 7291-7300.