Multiscale molecular modeling with machine learning
Molecular simulations have now become a necessary component of research on molecular and atomic structure of matter with uncountable applications in physics, chemistry, biology and material science. Traditional molecular dynamics (MD) simulations rely on numerical solution of Newtonian equations of motion for all atoms (particles) in the simulated system, with forces computed either from quantum computations of the electron structure, or from empirical expressions called the force field. For many systems of interest these are very time-consuming computations. Data driven-approaches, including machine learning and artificial neural networks (ANN), has recently emerged as a novel way to formulate models for molecular simulations. Within this approach, local arrangement of atoms is presented as a set of descriptors. These descriptors are used as an input to ANN, which is trained to fit the quantum-mechanical energy surface. For large bimolecular and material systems (> 20 nm size) atomistic simulations, even with empirical or ML force fields are becoming computationally unfeasible. For such system, coarse-grained (CG) models, uniting atomic groups to single interaction centers and/or omitting solvent, are used. Force fields for a CG model can be deduced from atomistic simulations using multiscale methodology, for example by the developed by PI Inverse Monte Carlo method. The aim of this project is to develop ML methodology to provide force fields for coarse-grained simulations by training ANN on results of atomistic simulations. Instead of energy surface used to train ANN from ab-initio simulations (which is not available in case of CG models deduced from atomistic representations), canonical averages of local structural properties, such as radial distribution functions, as well as distributions of bond lengths and angles, are used. Loss function is determined in terms of deviation of the CG distributions from reference distributions obtained in atomistic MD simulations and used for training ANN determining energies and forces in the CG systems.
During the initial stage of the project, as a "proof of concept", we have trained ANN by properties of a model system of particles interacting by the Lennard-Jones potential, as well as for a solvent-free CG model of methanol in water solution. The primary aim of the present stage of the project will be to investigate how the efficiency of ANN training and quality of the ANN force field depends on the parameters of the network such as number of intermediate layers and number of neurons in them. Furthermore, as a realistic study case, we are planning to use ML algorithms to parameterize CG force field for lipid bilayer simulations, and at next stage for interaction of lipid bilayers and peptides with inorganic nanoparticles. The PI group has a large and continuously increasing database of atomistic trajectories generated in MD simulations of lipids in bilayers and in contact with metal oxide surfaces and nanoparticles which will be used for ANN training. The finally trained CG model of lipids and nanoparticles will be validated by comparison with atomistic simulations which were not used during ANN training.