PCA in Chemistry
The systematic and quantitative investigation of the way in which different chemical components affect a given reaction, for example solvent effects, are not trivial. Variation in the levels of discrete factors, such as the type of solvent, often involves significant changes in the properties on the molecular level, e.g. going from hexane to toluene to DMSO. As you move from one solvent to another a number of the molecular properties will change, not just one isolated chemical or physical property.
In order to adequately portray the intrinsic molecular properties of a series of compounds, initially the chemical properties (descriptors) for the compounds need to be gathered. It is important that the properties which are chosen represent the compound and are relevant to the application. It is usual to have correlated variables in datasets of chemical descriptors due to the following assumptions:
- Within a class of related compounds it is reasonable to assume that a gradual variation in a measured chemical property will correspond to an analogous gradual variation of the intrinsic molecular property for the chemical series.
- It is reasonable to expect that properties which are measurable and depend on the same underlying intrinsic molecular property will be more or less correlated to each other.
The properties can either be obtained from the experimental measurement of chemical and physical properties or from computationally calculated properties. It is typical to use a hybrid of computational and experimental properties.
The principal components of the intrinsic molecular properties of compounds can be determined by undertaking a PCA on the set of compounds and their properties. As these principal components summarise the intrinsic molecular properties they are typically referred to as principal properties. Each of these principal properties becomes a continuum and can be considered similar to a traditional continuous variable in DoE. The combination of two or more principal properties provides a semi continuous map. The map is semi continuous because a material is unlikely to exist at every point in two or more dimensions. The map then allows the selection of a representative subset of compounds – either diverse or focused on key properties deepening on the needs of the investigation (Carlson, R.; Carlson, J. E. Org. Process Res. Dev. 2005, 9, 680).
The use of PCA allows the conversion of discreet variables into a semi continuous model through the transformation of the chemical properties to molecular properties. This provides a significant reduction in the dimensionality, typically 3 or 4 dimensions will explain more than 70% of the data, to produce ‘maps’ of the chemical space wherein compounds behaving similarly will be closer to each other. For example in the initial exploitation of PCA for solvents Carlson utilised nine experimentally measured properties for 82 solvents (Carlson R.; Lundstedt T.; Albano C. Acta. Chem. Scan., 1985, B39, 79-91); melting point, boiling point, dielectric constant, dipole moment, refractive index, density, log P, water solubility and Dimroth–Reichard ET30 parameter which is a measure of the ionising power of the solvent. The analysis of the nine chemical properties identified the two principle properties of water solubility and polarisability of the solvents, which explained approx. 70% of the data.
In our experience the addition of supplementary experimental and computational properties to describe the ability of the solvent to dissolve and interact with substrates and intermediates are beneficial to understanding solvent effects. The inclusion of the additional properties enables the identification of principal properties for solvent polarity and hydrogen bonding on top of water solubility and polarisability.
The use of PCA to generate principal properties which better represent large datasets has also been used for carbene ligands, phosphine ligands, Lewis acids, amines, carbonyl compounds, substituents on aromatic rings, eluents for chromatography and different supports for chromatography to highlight a few.
For more information on PCA see the link PCA.
Paul Murray Catalysis Consulting will provide our clients with:
- The selection of suitable properties for chemical datasets to generate appropriate PCA maps.
- The selection of materials from PCA maps to enable the efficient understanding of the chemical space and factors to produce a commercially viable chemical reaction.
- Partial Least Squares (PLS) modelling to understand the properties of materials that play a significant role in optimum reaction development and prediction of suitability of alternative materials for the chosen reaction.