Volume 68, Issue 6 e17469
RESEARCH ARTICLE

Next generation pure component property estimation models: With and without machine learning techniques

Abdulelah S. Alshehri

Abdulelah S. Alshehri

Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA

Department of Chemical Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia

Contribution: Data curation (lead), Formal analysis (lead), ​Investigation (lead), Methodology (lead), Software (lead), Validation (lead), Visualization (equal), Writing - original draft (lead), Writing - review & editing (equal)

Search for more papers by this author
Anjan K. Tula

Anjan K. Tula

College of Control Science and Engineering, Zhejiang University, Hangzhou, China

Contribution: Data curation (equal), Formal analysis (equal), ​Investigation (equal)

Search for more papers by this author
Fengqi You

Corresponding Author

Fengqi You

Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York, USA

Correspondence

Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.

Email: [email protected]

Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.

Email: [email protected]

Contribution: Conceptualization (equal), Funding acquisition (equal), ​Investigation (equal), Methodology (equal), Project administration (equal), Resources (equal), Supervision (equal), Visualization (equal), Writing - review & editing (equal)

Search for more papers by this author
Rafiqul Gani

Corresponding Author

Rafiqul Gani

Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea

PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod Denmark

Correspondence

Fengqi You, Robert Frederick Smith School of Chemical and Biomolecular Engineering, Ithaca, NY 14853, USA.

Email: [email protected]

Rafiqul Gani, PSE for SPEED Company, Skyttemosen 6, DK_3450 Allerod, Denmark.

Email: [email protected]

Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (equal), ​Investigation (equal), Project administration (equal), Software (equal), Supervision (equal), Validation (equal), Writing - review & editing (equal)

Search for more papers by this author
First published: 26 September 2021
Citations: 17

Abstract

Physiochemical properties of pure components serve as the basis for the design and simulation of chemical products and processes. Models based on the molecular structural information of chemicals for the following 25 pure component properties are presented in this work: (critical-) temperature, pressure, volume, acentric factor; (normal-) boiling point, melting point, auto-ignition temperature; flash point; (standard-) enthalpy of formation, Gibbs energy of formation, enthalpy of fusion, enthalpy of vaporization, liquid molar volume; (environmental-) (lethal dose-) LC50 and LD50, photo-chemical oxidation potential, bioconcentration factor, permissible exposure limit; (physicochemical-) acid dissociation constant, water-solubility, octanol–water partition coefficient, Hildebrandt solubility parameter, Hansen solubility parameters. Utilizing functional groups for molecular representation, two parallel property estimation models where the group contributions for each property are regressed through traditional regression techniques and machine learning techniques are presented. Both techniques use an a priori data analysis before regression of model parameters. A dataset with more than 24,000 chemicals for the 25 pure component properties has been utilized for the development of the two sets of property models. The efficacy of the developed models and their use are highlighted together with a discussion on the overall performance, application range, and predictive capabilities with implications to product and/or process engineering problem solutions.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available in GitHub at https://github.com/PEESEgroup/Pure-Component-Property-Estimation