Multiscale molecular modeling with machine learning
Title: Multiscale molecular modeling with machine learning
SNIC Project: Berzelius-2022-60
Project Type: LiU Berzelius
Principal Investigator: Alexander Lyubartsev <>
Affiliation: Stockholms universitet
Duration: 2022-04-01 – 2022-10-01
Classification: 10407


Molecular modeling and simulations have now became a necessary component of research on molecular and atomic structure of matter with uncountable applications in physics, chemistry, biology and material science. Traditional molecular dynamics (MD) simulations rely on numerical solution of Newtonian equations of motion for all atoms (particles) in the simulated system, with forces computed either from quantum computations of the electron structure, or from empirical expressions called the force field. For many systems of interest these are very time consuming computations, especially in the case of ab-initio simulations. Data driven-approaches, including machine learning and artificial neural networks (ANN), has recently emerged as a novel way to formulate models for molecular simulations. Within this approach, local arrangement of atoms is presented as a set of descriptors. These descriptors are used as an input to ANN, which is trained to fit the quantum-mechanical energy surface. For large bimolecular and material systems ( > 20 nm size) atomistic simulations, even with empirical or ML force fields are becoming computationally unfeasible. For such system, coarse-grained (CG) models, uniting atomic groups to single interaction centers and/or omitting solvent, are used. Force fields for a CG model can be deduced from atomistic simulations using multiscale methodology, for example by the developed by PI Inverse Monte Carlo method. The aim of this project is to develop ML methodology to provide force fields for coarse-grained simulations by training ANN on results of atomistic simulations. Instead of energy surface used to train ANN from ab-initio simulations (which is not available in case of CG models deduced from atomistic representations), canonical averages of local structural properties will be used. During the first stage of the project, as a "proof of concept" , we train ANN by properties of a model system of particles interacting by the Lennard-Jones potential. As a realistic study case, we are planning to use ML algorithms to parametrize CG force field for lipid bilayer simulations, and as the next stage for interaction of lipid bilayers and peptides with inorganic nanoparticles. The PI group has a large database of atomistic trajectories generated in MD simulations of lipid bilayers composed of different biologically relevant lipids with variety of head groups and hydrocarbon tails, as well as in contact with metal oxide surfaces and nanoparticles. Structural properties of these bilayers, presented as distributions of distances and / or angles between CG sites of lipids, will be used as reference data to fit. Additional training atomistic simulations data, particularly for aminoacids and polypeptides interacting with nanoparticles will be generated within this project. Loss function is determined in terms of deviation of the CG distributions from reference distributions obtained in atomistic MD simulations, and used for training ANN determining energies and forces in the CG systems. The finally trained CG model of lipids and nanoparticles will be validated by comparison with atomistic simulations which were not used during ANN training.