Technology attributes
Other attributes
Computational chemistry is a branch of chemistry using software to simulate and solve complex chemical problems. It uses a combination of theoretical chemistry and simulation to help determine the structures and properties of molecules. Computational chemists integrate mathematical algorithms, statistics, databases, and experimental observations to develop chemical simulations and modeling computations. Examples of the applications of computational chemistry include identifying the drug binding sites of drugs; how kinetics and thermodynamics affect chemical synthesis reactions; and the scientific exploration of physical processes (superconductivity, energy storage, corrosion, or phase changes) of molecules during chemical reactions. Computational chemistry approaches are also used to develop catalysts for sustainable fuel and chemical production.
Computational chemistry dates back to 1928 and early attempts to solve the Schrödinger equation using hand-cranked calculating machines. Calculations verifying solutions to the Schrödinger equation quantitatively reproduced experimental observations of simple systems, such as the helium atom and the hydrogen molecule. With the development of electronic computers, the new discipline began emerging in the 1950s. Chemists began obtaining quantitative information about the behavior of molecules using digital computers to make numerical approximations of the Schrödinger equation.
During the 1960s, major developments in algorithms and methodology increased the use of quantum chemistry, including the following:
- The development of computationally feasible, accurate basis sets
- The demonstration of reasonably accurate approximate solutions to the electron correlation problem
- The derivation of formulas for analytic derivatives of the energy with respect to nuclear position
These developments were incorporated into various software packages that were widely available in the early 1970s. With the ability to predict the structure and reactivity of molecules and complement information obtained from spectral measurements, new software packages led to a growth in publications related to computations for chemical problems.
Different areas of computational chemistry are concerned with the exploration of chemical space which is defined as the set of all possible organic compounds. The virtual chemical space has 1063 possible organic compounds of 30 atoms in size. Mapping and visualization of chemical space are areas of research that aim to provide meaningful representations of chemical space.
- Quantitative structure activity relationship (QSAR) where the output to be predicted is usually the biological activity of the compound
- Deep neural network (DNN)-based QSAR models
Protein contact prediction is the prediction of spatial proximity of any two residues of a protein sequence when it is folded as its 3D structure.
- Long timescale molecular dynamics (physics-based simulations)
- Knowledge-based physical approaches
- Machine learning (ANNs, SVMs, and hidden Markov model)
Quantum chemistry is the application of quantum mechanics to the theoretical study of chemical systems. Machine learning is applied to quantum chemistry to supplement or replace traditional quantum mechanical calculations.
- Amazon Braket is an Amazon Web Services (subsidiary of Amazon) service for developers and researchers. Customers can explore and design, test, and troubleshoot quantum algorithms on simulated quantum computers. Then customers are able to use Amazon Bracket to run their quantum algorithms on quantum processors including D-Wave, IonQ, and Rigetti.
Potential electrocatalysts for sustainable fuel and chemical production can be tested for sustainability and efficiency using computational quantum chemistry. Quantum chemistry computing simulations were used to categorize hypothetical electrocatalysts that are too slow or too expensive.
Computational chemistry methods can be divided into those based on quantum chemical phenomena and those based on molecular mechanics. Quantum-based methods explicitly account for electrons, while molecular mechanics approaches do not. Quantum chemical (QC) methods may also be called electronic structure, first principles, or ab initio methods. They calculate how electrons and nuclei interact by solving the time-independent electronic Schrödinger equation in the Born–Oppenheimer approximation. There are two main types of QC methods utilizing either wave function methods or density functional theory (DFT).
Molecular mechanics (MM) refers to methods that compute certain molecular properties, particularly molecular structure and relative energy. These methods use fairly simple potential energy functions derived from classical mechanics and rely on parameters derived from experimental data or quantum mechanics-based calculations. A collection of potential energy functions and associated parameters used for molecular mechanics calculations is frequently referred to as a "force field." Therefore, calculations that utilize molecular mechanics methods are often referred to as empirical force field calculations.
These methods solve the electron Schrödinger equation, calculating the explicitly correlated electronic wave functions. Hartree–Fock (HF) is the simplest wave function-based method, in which the multielectron wave function is approximated to a single Slater determinant (the mathematical expressions for wave functions in quantum mechanics). Using this approximation does neglect the instantaneous electron correlation; it considers that one electron will move in an averaged field created by the remaining electrons. In reality, the motion of electrons is correlated, meaning the motion of one depends on the instantaneous mutual interaction with the other electrons. This produces errors in the HR wave function and energy, affecting the prediction of the kinetic barriers or the description of London dispersion forces.
HR can be improved using multiple approaches, such as incorporating many-body perturbation theory, culminating in the Møller–Plesset (MPn) methods, or expressing the wave function as a linear combination of Slater determinants. These post-HR methods improve the quality of computational chemistry results. However, they are considered computationally expensive such that they are hampered by the size of the systems. Calculations for more than 25-30 atoms are unfeasible in practice, and wave function-based methods are often used for surface modeling to calibrate more approximate but computationally cheaper methods.
DFT utilizes mathematical formulations first developed by Kohn and Sham in their 1965 paper "Self-consistent equations including exchange and correlation effects." The theory states that the ground state energy of a non-degenerate electronic system is defined by the total electron density. This offers advantages as the electron density for a system of N electrons only depends on 3 spatial coordinates compared to wave function-based methods, which depend on 3N spatial and N spin variables. Therefore, DFT methods are computationally cheaper than even HR methods while also including instantaneous electron correlation. The disadvantage of DFT is that although the relationship between the electron density and the energy of the system can be mathematically demonstrated, the exact form is unknown.
A wide variety of DFT methods exist, depending on the functionals used to connect these two quantities, known as exchange-correlation functional. The simplest approach is local density approximation (LDA), which implicitly assumes the relationship between electron density and the energy of the system depends on the functional expression of electron density at the local position. LDA can be improved utilizing generalized gradient approximation (GGA), which also depends on the gradient of the density, accounting for spatial variations in electron density across the chemical system.
The main drawback of DFT methods is that they do not account for long-range non-covalent interactions. This is not particularly important when modeling conventional surfaces, but it can be important when simulating the elementary steps taking place at the surfaces (i.e., adsorption, diffusion, reaction).
Semiempirical methods offer faster approaches to overcome the potential size limitations of DFT. They are derived from pure QC methods utilizing different approximations while also including empirical parameters to mitigate errors due to the approximations and account for electron correlation effects. Semiempirical methods are faster than ab initio counterparts due to simplifications and parametrizations, which permit the simulation of larger chemical systems. However, the accuracy strongly depends on whether the parametrization is suitable for the specific case being studied. Semiempirical methods can produce meaningless results if the simulated system does not correspond with the training set used to parametrize the method.
Basis sets approximate the electronic wave function using a linear combination of basis functions to build molecular orbitals. The accuracy of the calculation and the results it produces strongly depend on the completeness of the chosen basis set. The main types of basis sets used to simulate molecular and periodic systems are Slater-type orbitals (SLO), Gaussian-type orbitals (GTOs), and plane waves (PWs). Introduced by physicist John C Slater in 1930, STOs are functions used as atomic orbitals for the linear combination of atomic orbitals-molecular orbital method. GTOs are localized functions that are centered on the atoms, they are commonly used for molecular calculations as they obey typical radial-angular decomposition and exhibit the spatial and symmetry properties of atomic orbitals. PWs are periodic functions in that they are not localized functions but uniformly diffuse in space. This means their use requires the adoption of periodic boundary conditions.
MM methods, also called force field methods, ignore electrons and their motion. The chemical system is represented as a "ball and spring" model where atoms are simplified to balls of different sizes, and the bonds between atoms are springs of different stiffness. Taking a MM approach means the energy of the system is calculated as a function of only the nuclear positions. Force fields, a set of interatomic potentials defining energy functions and parameters, are used to define the MM energy, measuring the degree of mechanical strain within the system. These include bonded and non-bonded terms to represent the intra- and inter-molecular forces. The interatomic potentials should model:
- The interaction between pairs of bonded atoms (covalent bond stretching)
- The energy required for bending an angle formed by three atoms (angle bending)
- The energy changes associated with bond rotations (torsional or dihedral)
- The non-bonded pairwise interactions
The energy functions contain a set of parameters that define the system depending on the different types of atoms, chemical bonds, angles torsions, non-bonded interactions, and other terms. The force field parametrization was previously derived from experimental data but has since been replaced by QC calculations.
MM methods are used to simulate nuclear motion within a molecule through molecular dynamic (MD) simulations. MD methods use successive configurations of the system generated by integrating Newton's laws of motion. They produce a trajectory defining the position and velocity of nuclei as a function of time, where the MM energy and the system's nuclear forces are determined at each nuclear position. MD simulations are often used to study the evolution of the atomic positions subject to internal chemical forces, while the temperature of the system is provided by the kinetic energy associated with the nuclear motion.
MD simulations can also be performed by moving the nuclei within the electronic field defined by the corresponding electronic wave function or the electron density usually computed within the DFT. In these cases, electrons are treated quantum-mechanically, and the nuclei, within the Born-Oppenheimer approximation, are treated as classical particles such that their dynamics follow the integration of Newtonian equations. These simulations are referred to as ab initio molecular dynamics (AIMD) and although they are more computationally expensive than pure MD methods, modeling the electronic structure allows for the study of bond breaking/formation as the result of the internal exchange of energy.
Computational chemistry utilizes databases of information related to chemicals, molecules, and their interactions. Examples include databases to parametrize electronic structure theory methods and to assess their capabilities and accuracy for a broad set of chemical problems. Notable examples of databases used for computational chemistry applications include the following:
- BindingDB is a public, web-accessible database of measured binding affinities focused on interactions between potential protein targets and small molecules.
- ChEMBL is maintained by the European Bioinformatics Institute, of the European Molecular Biology Laboratory, based at the Wellcome Trust Genome Campus, Hinxton, UK.
- Generated database GDB17, the chemical universe database, enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S, and halogens following rules for chemical stability, synthetic feasibility, and medicinal chemistry.
- GDBMedChem is a collection of 10 million small molecules from GDB17 inspired by medicinal chemistry and excludes problematic functional groups and complex molecules.
- PubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information, a component of the National Library of Medicine, which is part of the United States National Institutes of Health.
There is a significant amount of computational chemistry software available including free or open-source packages. Examples of computational chemistry software include the following:
- Gabedit is a freeware graphical user interface with tools for editing, displaying, analyzing, converting, and animating molecular systems.
- ioChem-BD is a tool for managing large volumes of quantum chemistry results from a diverse group of simulation packages.
- MOLCAS is a program system for calculations of electronic and structural properties of molecular systems in gas, liquid, or solid phase developed by scientists at Lund University, Sweden.
A number of journals exist for computational chemistry with other related journals also regularly publishing papers on computational chemistry. These include journals from the following publishers and institutions:
- Journal of Chemical Theory and Computation
- Journal of Chemical Information and Modeling
- Journal of Physical Chemistry A
- Journal of Chemical & Engineering Data
- Journal of Chemical Physics
- Physical Chemistry Chemical Physics
- Wiley Interdisciplinary Reviews: Computational Molecular Science
- Journal of Computational Chemistry
- ChemPhysChem
- International Journal of Quantum Chemistry
- Computational and Mathematical Methods
- Chemistry Select
- Journal of Physical Organic Chemistry
- Advanced Theory and Simulations
- Journal of Molecular Structure
- Journal of Molecular Graphics and Modelling
- Chemical Physics Letters
- Chemical Physics
- Computational and Theoretical Chemistry
- Computational Materials Science
- Molecular Catalysis
- Applied Surface Science
- International Reviews in Physical Chemistry
- Molecular Physics
- Molecular Simulation
- Journal of Computer-Aided Molecular Design
- Structural Chemistry
- Theoretical Chemistry Accounts
- Journal of Molecular Modeling
Competitions make use of gamification and crowd-sourcing for the analysis of data.
- CASP (Critical Assessment of Structure Prediction) aims to advance the state of the art in modeling protein structure from amino acid sequences. Participants are invited to submit models for a set of proteins for which the experimental structures are not yet public. Assessments and results are published in a special issue of the journal PROTEINS.
- The Merck Molecular Activity Challenge (2012) contained a list of compounds of known activity in a given assay, and the challenge was to recapitulate the data through simulation. The competition was won by a team of academics in the Kaggle community and located at University of Toronto and University of Washington. The team used deep learning.
- Predicting a Biological Response competition (2012) by Boehringer working through Kaggle provided training data comprised of molecular characteristics and experimental data with the challenge of creating the best algorithm to build predictive models. The winning team included two research directors at an insurance firm and a neurobiologist from Harvard University.
- Tox21 Data Challenge (2014) was a challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental toxicants and drugs. The Tox21 Program (Toxicology in the 21st Century) is a collaboration between U.S. federal agencies including the NIH to characterize the potential toxicity of chemicals.