Machine learning solves the who is who problem

Probabilistic assignment of NMR spectrum

image: Probabilistic assignment of the 13 C NMR spectrum of crystalline strychnine
view more

Credit: @EPFL Manuel Cordova

Solid-state nuclear magnetic resonance (NMR) spectroscopy – a technique that measures the frequencies emitted by the nuclei of some atoms exposed to radio waves in a strong magnetic field – can be used to determine chemical and 3D structures as well as the dynamics of molecules and materials.

A necessary initial step in the analysis, however, is the so-called chemical shift assignment. This involves allocation each peak in the NMR spectrum to a given atom in the molecule or material being studied. This can be a particularly complicated task. Assigning chemical shifts experimentally can be challenging and generally requires time-consuming multidimensional correlation experiments. Allocation in relation to statistical analysis of experimental chemical shift databases would be an alternative solution, but there is no such database for molecular solids.

A team of researchers including EPFL professors Lyndon Emsley, head of the Laboratory of Magnetic Resonance, Michele Ceriotti, head of the Laboratory of Computational Science and Modeling and PhD student Manuel Cordova decided to tackle this problem by developing a method for to assign NMR spectra of organic crystals likely, directly from their 2D chemical structures.

They started by creating their own database of chemical shifts for organic solids by combining Cambridge Structural Database (CSD), a database of more than 200,000 three-dimensional organic structures, with ShiftML, a machine learning algorithm they had previously developed together that enables for predicting chemical shifts directly from the structure of molecular solids.

ShiftML, originally described in a Nature Communications paper in 2018, uses DFT calculations for training, but can then perform accurate predictions on new structures without performing additional quantum calculations. Although DFT accuracy is obtained, the method can calculate chemical shifts for structures with 100100 atoms per second, reducing the computational cost by a factor of as much as 10,000 compared to current DFT chemical shift calculations. The accuracy of the method does not depend on the size of the structure studied, and the prediction time is linear in the number of atoms. This sets the stage for calculating chemical shifts in situations where it would have been impossible before.

In the Science Advances paper, they used ShiftML to predict shifts of more than 200,000 compounds extracted from the CSD and then related the shifts obtained to topological representations of the molecular environments. This involved constructing a graph representing the covalent bonds between the atoms of the molecule and extending it a given number of bonds away from the central atoms. They then collected all the identical occurrences of the graph in the database so that they could obtain statistical distributions of chemical shifts for each motif. The representation is a simplification of the covalent bonds around the atom in a molecule and contains no 3D structural features: this enabled them to obtain the probable assignment of the NMR spectra of organic crystals directly from their two-dimensional chemical structures through a marginalization scheme that combined the distributions from all the atoms in the molecule.

After constructing the database for chemical shifts, the researchers tried to predict the tasks on a model system and applied the approach to a set of organic molecules for which the assignment of chemical shifts to carbon has already, at least in part, been determined experimentally: theophylline, thymol, cocaine, strychnine, AZD5718, lisinopril, ritonavir and the K-salt of penicillin G. The allotment probabilities obtained directly from the two-dimensional representation of the molecules were found to match the experimentally determined allotment in most cases.

Finally, they assessed the performance of the framework on a benchmark set of 100 crystal structures with between 10 and 20 different carbon atoms. They used ShiftML predicted shifts for each atom as the correct assignment and excluded them from the statistical distributions used to assign the molecules. The correct task was found among the two most likely tasks in more than 80% of the cases.

“This method could significantly accelerate the study of materials by NMR by streamlining one of the significant first steps in these studies,” Cordova said.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases sent to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Post a Comment

Previous Post Next Post