.. _cpidalkanolaminesAppNote: Prediction of ideal heat capacity, C\ :sub:`p,id`\ (T), of alkanolamines and amides: a combined QM – QSPR approach ------------------------------------------------------------------------------------------------------------------ .. originaauthor Philippe Ungerer .. sectionauthor:: Marianna .. reviewer Xavier .. status publishable .. expert Marianna, Xavier .. keywords heat capacity, alkanolamines, QM, QSPR, MOPAC, compressibility, speed of sound .. modules GIBBS, MOPAC *Keywords: heat capacity, alkanolamines, QM, QSPR, MOPAC* Introduction ^^^^^^^^^^^^^ In process engineering, heat capacity is an important property that determines the amount of energy to provide to a system to raise its temperature. However, there is not yet a group contribution approach that allows prediction of heat capacity with temperature. The variations of heat capacity with temperature are influenced by the quantification of the vibrational energy levels [#McQuarrie1976]_ [#Benson1976]_. This is probably why empirical approaches are subject to difficulties. In the ideal gas state, these variations can be reliably predicted by quantum chemistry methods [#Rozanska2014]_. In the liquid or dense gas state, the prediction is a combination of an ideal contribution and a residual contribution [#Yiannourakou2014]_: :math:`C_{p}(T,P) = C_{p,id}(T) + C_{p,res}(T,P)`, where the ideal gas heat capacity :math:`C_{p,id}(T,P)` is the main term containing vibrational, transitional, and rotational contributions, and the residual term :math:`C_{p,res}(T,P)` is predicted using classical statistical mechanics or equations of state [#Lagache2001]_. The present application note illustrates the high throughput capabilities of |medea| [#TM]_ to predict ideal heat capacity of a substantial list of amides and alkanol amines in a large temperature range (200-1,000 K). The results are then used to parametrize a new functional form for :math:`C_{p,id}(T,P)` allowing fast prediction and improved physical relevance for engineering applications. For instance, the new functional form is consistent with the continuously increasing trend of :math:`C_{p,id}(T,P)` and has the expected asymptotic limit. An original QSPR approach is developed to parametrize the new expression. Method and protocol ^^^^^^^^^^^^^^^^^^^^ The list of compounds is assembled by retrieving SMILES strings from internet sources, namely, `Wikipedia`, `NIH`, and `NIST`. A structure list is then created in |medea| by reading a text file of names and `SMILES` strings. In order to sample the possible shapes, the structure list contains up to ten stable conformers per compound, which are built by exploring different sequences of dihedral torsion angles. Each conformer of the list is structurally optimized using |medea| |mopac| with the semi-empirical PM7 method and a vibrational analysis is performed. The heat capacity is computed and printed for a list of regularly spaced temperatures gathered in a property table in the `Job.out` file of the task, together with the enthalpy of formation and entropy. The structure list comprises 36 species (:numref:`Figure %s `). For each molecule, the lower energy conformer is identified and its vibrational analysis is checked for the absence of imaginary vibration frequencies. A corresponding property table is printed out for each species and used to develop new QM-based correlations or compare with existing correlations (for instance those from `DIPPR`). .. _Cpid-Picture1: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture1.png :width: 3.1in Names and formulas for the training set of 36 compounds. In the present work, we use the following functional form for :math:`C_{p,id}(T,P)`: .. math:: :label: Cpideal C_{p,id}(T) = A + B \left[ \dfrac{\dfrac{C}{T}}{sinh(\dfrac{C}{T})} \right] ^{2} + D \left[ \dfrac{\dfrac{E}{T}}{sinh(\dfrac{E}{T})} \right] ^{2} where *A*, *B*, *C*, *D*, and *E* are theory-derived or empirical parameters. If we set :math:`\theta_{\nu 1} = 2C` and :math:`\theta_{\nu 2} = 2E`, the above equation is exactly equivalent to the heat capacity of an ideal gas of polyatomic molecules with two characteristic vibrational temperatures :math:`\theta_{\nu 1}` and :math:`\theta_{\nu 2}` [#McQuarrie1976]_. The parameter *B* or *D* is the degeneracy (*i.e.* the weight) of each vibrational frequency. The parameter *A* is the ideal heat capacity in the limit of low temperature (translational + rotational contributions): .. math:: :label: CpideallimlowT A = R \left( \dfrac{5}{2} + \dfrac{p}{2} \right) The number of rotational degrees of freedom is *p* = 2 for a linear molecule and *p* = 3 for a non-linear molecule. Another constraint is the asymptotic value in the limit of high temperatures, which corresponds to a contribution of *R*/2 per vibrational degree of freedom (in application of the equipartition of energy): .. math:: :label: BDcontrib B + D = R(3n-3-p) where *n* is the total number of atoms. The model outlined in Equations :eq:`Cpideal` - :eq:`BDcontrib` has certainly important limitations, *e.g.*, it does not account for possible conformational changes, although it maintains a low number of independent parameters (three), thus making their regression more reliable. It is also ensuring a safe behavior in the low and high temperature limits. It may be noted that Equation :eq:`Cpideal` is close from the equation used in `DIPPR` to correlate :math:`C_{p,id}(T,P)`, the only difference being that DIPPR contains a term :math:`cosh(E/T)` instead of :math:`sinh(E/T)`. This may lead to some artifacts at high temperature (see :ref:`Appendix ` for more details and references about this equation). .. In order to identify the parameters *B*, *C*, *D*, and *E* for a given compound, we minimize the following dimensionless error criterion: .. .. math:: .. :label: errorcriterion .. F_{i} = \dfrac{1}{m} \sum _{k=1} ^{m} \dfrac{(C_{p,id,eq1}(T_{k})-C_{p,id,ref}(T_{k}))^{2}}{\sigma ^{2}} .. where the summation runs over all temperatures for which the PM7 results :math:`C_{p,id,ref}` .. are available, and :math:`\sigma` is the estimated uncertainty on :math:`C_{p,id,ref}`. .. We selected :math:`\sigma = 0.02 \cdot C_{p,id,ref}` because the average absolute deviation .. of PM7 on ideal heat capacity is around 2% for organic compounds up to `C10` .. [#Rozanska2014]_. Minimization of *F* is done using the GRG method in the Excel Data Analysis .. module, imposing two linear constraints (Equations :eq:`CpideallimlowT` and :eq:`BDcontrib`) .. and four conditions (*B*>0, *C*>0, *D*>0, and *E*>0). The final values of *F*\ :sub:`i`\ .. are indicative of underfitting (*F*\ :sub:`i`\ >>1) or overfitting (*F*\ :sub:`i`\ <<1). Results ^^^^^^^^ A standard flowchart of |medea| |mmopac| for thermodynamic property prediction is applied to the structure list of 39 amines and amides of :numref:`Figure %s `, which contain 0 to 20 carbon atoms. The list includes primary, secondary, tertiary, and quaternary cyclic or branched amines and amides. .. The number of hydroxyl groups ranges from zero to three. The list comprises three .. hydrochlorinated amines in which the protonated amine and the chlorine anion are forming .. ion pairs. |medea| |mopac| helped finding ground state structure for each molecule, with the exception of one hydrochlorinated amine which would require an improved initialization. The training set contains 38 compounds, including two hydrochlorinated amines. Thermodynamic properties are printed for temperatures between 308 and 808 K with a value each 10 K. .. In a first step, the optimization .. of criterion :math:`F_{i}` :eq:`errorcriterion` with .. respect to parameters *B*, *C*, *D*, and *E* is done separately for each compound, using .. constraints from theory (Equations :eq:`CpideallimlowT` and :eq:`BDcontrib`). In a first step, the optimization for the parameters *B*, *C*, *D*, and *E* is done separately for each compound. .. The :math:`F_{i}` values recorded for the 38 compounds range from 0.07 to 0.64, indicating .. that Equation :eq:`Cpideal` is appropriate for parameter identification with average errors .. lower than 2%, without underfitting nor overfitting. The parameter *A* = 33.26 J mol\ :sup:`-1` K\ :sup:`-1` is identical for all compounds, as the set contains non-linear molecules exclusively. *B* and *D* parameters increase with molecular size, as expected. The parameter *C* ranges between 300 and 500 K, while most compounds exhibit values between 300 and 400 K. The parameter *E* ranges from 1150 to 1450 K. These values correspond to :math:`\theta _{\nu 1} = 2C` between 600 and 1000 K, and :math:`\theta _{\nu 2} = 2E` between 2300 and 2900 K, *i.e.*, within the expected range of characteristic vibrational temperatures. In a second step, we correlate parameters *A*, *B*, *C*, *D*, and *E* with selected descriptors. In :numref:`Figure %s `, the parameter *B* appears to vary linearly with the total number of atoms with a regression coefficient of 0.97. .. _Cpid-Picture2: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture2.png :width: 3.1in :align: center Variation of parameter *B* of Equation :eq:`Cpideal` with the total number of atoms for the 36 amines of :numref:`Figure %s `. The parameter *D* correlates best with the number of hydrogen atoms *n*\ :sub:`H` (:numref:`Figure %s `). The regression coefficient is 0.99. .. _Cpid-Picture3: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture3.png :width: 3.1in :align: center Variation of parameter *D* of Equation :eq:`Cpideal` with the number of hydrogen atoms for the 36 amines of :numref:`Figure %s `. The parameters *C* and *E* also vary regularly, yet they change faster for small carbon numbers. They correlate linearly with :math:`\dfrac{1}{n_{H}}` (:numref:`Figure %s `). .. _Cpid-Picture4: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture4.png :width: 3.1in :align: center Variation of parameters *C* and *E* of Equation :eq:`Cpideal` with the inverse number of hydrogen atoms. The functional forms suggested by Figures :numref:`%s ` and :numref:`%s ` are: .. math:: :label: funcform1 B = B_{0} + B_{1} \cdot n .. math:: :label: funcform2 C = C_{0} + \dfrac{C_{1}}{n_{H}} .. math:: :label: funcform3 D = D_{0} + D_{1} \cdot n_{H} .. math:: :label: funcform4 E = E_{0} + \dfrac{E_{1}}{n_{H}} .. These relations are used to further minimize the .. global error criterion. For this purpose, the summed :math:`F_{i}` .. criteria for all 38 compounds is minimized with respect to parameters .. *B*\ :sub:`0`, *B*\ :sub:`1`, *C*\ :sub:`0`, *C*\ :sub:`1`, *D*\ :sub:`0`, .. *D*\ :sub:`1`, *E*\ :sub:`0`, and *E*\ :sub:`1`. This yields an average error criterion .. :math:`\overset{\overline{}}{F} = 1/38\sum_{i}^{}F_{i}`\ *=* 2.44, .. indicating average deviations in the order of 3% with respect to .. QM-generated results in the 308-808 K interval. An outline of results is indicated for a few selected compounds in (:numref:`Figure %s `). .. _Cpid-Picture5: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture5.png :align: center Selected results of ideal gas heat capacity prediction for alkanol amines and amides belonging to the training set of 36 molecules. Conclusions ^^^^^^^^^^^ The proposed functional form involving two vibrational frequencies (Equation :eq:`Cpideal`) allows parametrization from QM results covering the relevant range for most engineering applications (308-808 K). The main advantages are: (i) each parameter has a physical meaning given by quantum mechanics, ensuring consistent extrapolation to high and low temperatures, (ii) the stable optimization protocol involves only three parameters per pure compound, hence not resulting in additional computation cost for the end-user compared with DIPPR equations, (iii) the regressed parameters vary regularly in homogeneous series, allowing to build predictive correlations for :math:`C_{p,id}(T,n_{C})`, and (iv) the approach can be extended to ideal gas energy and entropy with theoretical expressions from [#McQuarrie1976]_. The proposed methodology presents a substantial improvement compared with previous methods (*e.g.* Benson’s method), which are not based on explicit QM calculations with vibrational analysis. It is also a substantial progress compared to experiment-based correlation work (*e.g.* `DIPPR` project, Poling’s textbook [#Poling2001]_), which predict sometimes unrealistic :math:`C_{p,id}(T)` at high temperature. Further improvement is still possible by increasing the number of parameters, for instance, introducing a third vibration frequency in Equation :eq:`Cpideal` or additional descriptors, or adding the effect of possible chemical transformations or conformational changes, and testing correlations with an independent validation dataset. .. _CpidAppendix: .. admonition:: Appendix The equation provided by Mc Quarrie (1976) for a polyatomic gas is: .. math:: :label: CvidealMcQuarrie \frac{C_{v,id}(T)}{Nk} = \frac{3}{2} + p/2 + \sum_{j}^{}{\left\lbrack \frac{\vartheta_{\text{vj}}}{T} \right\rbrack^{2}\ }\frac{exp(\vartheta_{\text{vj}}/T)} {\left( \exp{\left( \vartheta_{\text{vj}}/T \right) - 1} \right)^{2}} where *k* is the Boltzmann constant, *N* is the Avogadro number. The number of rotational degrees of freedom is *p* = 2 for a linear molecule and *p* = 3 for a non-linear molecule. The index *j* runs over the vibrational degrees of freedom, and :math:`\vartheta_{\text{vj}}` is the characteristic vibrational temperature associated with the *j*\ :sup:`th` vibrational degree of freedom. If we introduce :math:`C_{j} = \ \frac{\vartheta_{\text{vj}}}{2}`, considering that :math:`C_{p,id}-C_{v,id} = R = N*k` for an ideal gas, Equation :eq:`CvidealMcQuarrie` rearranges into: .. math:: :label: equivto \frac{C_{p,id}(T)}{R} = \frac{5}{2} + p/2 + \sum_{j}^{}{\left\lbrack \frac{C_{j}/T}{sinh(\frac{C_{j}}{T})} \right\rbrack^{2}\ } If the summation is simplified assuming two frequencies, we obtain Equation :eq:`Cpideal`. We can compare it with the Equation that correlates with the ideal heat capacity, :math:`C_{p,id}(T)` in DIPPR: .. math:: :label: CpidealDIPPR C_{p,id} = A + B \bullet \left\lbrack \frac{\frac{C}{T}}{\sinh\left( \frac{C}{T} \right)} \right\rbrack^{2} + D \bullet \left\lbrack \frac{\frac{E}{T}}{\cosh\left( \frac{E}{T} \right)} \right\rbrack^{2} In Equation :eq:`CpidealDIPPR`, the term: .. math:: :label: secondtermpCpidealDIPPR \left\lbrack \frac{\frac{C}{T}}{\sinh\left( \frac{C}{T} \right)} \right\rbrack^{2} is consistent with theory (Equation :eq:`Cpideal`), and the last term .. math:: :label: lasttermpCpidealDIPPR \left\lbrack \frac{\frac{E}{T}}{\cosh\left( \frac{E}{T} \right)} \right\rbrack^{2} is not. A more detailed analysis shows a decrease of the vibrational contribution when temperature increases up to several thousand K. .. _Cpid-Picture6: .. figure:: /AppNotes/images/AppNote12/Cpid-Picture6.png :width: 3.1in Comparison of the two terms :math:`\left\lbrack \frac{\frac{C}{T}}{\sinh\left( \frac{C}{T} \right)} \right\rbrack^{2}` and :math:`\left\lbrack \frac{\frac{E}{T}}{\cosh\left( \frac{E}{T} \right)} \right\rbrack^{2}` with temperature in Equation :eq:`Cpideal`. The constants *C* and *E* are set to 1000 K in this example. .. admonition:: Required |medea| modules * |medeagui| * |medea| |ht| module (structure lists, conformer analysis) * |medea| |mopac| (semi-empirical molecular QM, structural optimization, vibrational analysis, thermodynamic property determination) .. [#TM] |regTMinfo| .. [#McQuarrie1976] D. A. McQuarrie, (1976), "Statistical Mechanics", New York, Harper and Collins, New revised edition, (2000). .. [#Benson1976] S. W. Benson, "Thermochemical kinetics", New York, Wiley, (1976). .. [#Rozanska2014] X. Rozanska, J. J. P. Stewart, P. Ungerer, B. Leblanc, C. Freeman, P. Saxe, and E. Wimmer, "High-throughput calculations of molecular properties in the MedeA environment: Accuracy of PM7 in predicting vibrational frequencies, ideal gas entropies, heat capacities, and Gibbs free energies of organic molecules", J. Chem. Eng. Data **59**, 3136 (2014) (`DOI `__). .. [#Yiannourakou2014] M. Yiannourakou, P. Ungerer, B. Leblanc, N. Ferrando, and J.-M. Teuler, "Overview of MedeA\ :sup:`®`-GIBBS capabilities for thermodynamic property calculation and VLE behaviour description of pure compounds and mixtures: application to polar compounds generated from ligno-cellulosic biomass", Mol. Simul. **39**, 1165 (2013) (`DOI `__). .. [#Lagache2001] M. Lagache, P. Ungerer, A. Boutin, and A. H. Fuchs, "Prediction of thermodynamic derivative properties of fluids by Monte Carlo simulation", Phys. Chem. Chem. Phys. **3**, 4333 (2001) (`DOI `__). .. [#Poling2001] B. E. Poling, J. M. Prausnitz, and J. P. O'Connell, "The properties of gases and liquids, 5th edition", New York, McGraw Hill, (2001). .. only:: html :download: :download:`pdf `