A pilot macromolecular 3D structure determination project - Year 4
Title: A pilot macromolecular 3D structure determination project - Year 4
DNr: SNIC 2016/1-204
Project Type: SNIC Medium Compute
Principal Investigator: Martin Moche <Martin.Moche@ki.se>
Affiliation: Karolinska Institutet
Duration: 2016-05-01 – 2017-05-01
Classification: 10203 10601
Homepage: http://ki.se/en/mbb/get-started-at-national-supercomputer-centre-nsc
Keywords:

Abstract

In March 2013 Protein Science Facility (PSF) from Karolinska Institutet in Stockholm and National Supercomputer Centre (NSC) in Linköping started a pilot project to evaluate macromolecular crystallography (MX) software applications running at NSC Triolith. The pilot has a homepage (http://ki.se/en/mbb/get-started-at-national-supercomputer-centre-nsc) to guide new members, share software settings and guide an efficient supercomputer workflow. Supported by positive feedback from the Swedish MX community, the Swedish light source MAX IV decided to fund a pilot extension called PReSTO, that aim to support integrated structural biology calculations including macromolecular crystallography (MX), Nuclear Magnetic Resonance (NMR) and cryo-electron microscopy (cryo-EM). Karolinska Institutet researchers, recruited to NSC by the pilot, recently shared the first acknowledgement for running molecular dynamics software at NSC Triolith(1). The integrated structural biology workflow is supported by NSC thinlinc software enabling remote graphic applications such as COOT/VASCo/PyMol for model building, calculation and visualization of protein surface properties and structures. We first installed the parallel data processing software XDS used for the majority of protein X-ray diffraction datasets collected today including its graphical user interface (GUI) scripting derivatives XDSGUI/XDSAPP. We also installed SHELX C/D/E, where SHELXD is parallel software, and the SHELX-GUI called hkl2map and the two major software packages CCP4 and PHENIX. The phenix-GUI contains a slurm scheduler compatible with NSC Triolith and we installed modelling software Rosetta to be used with “phenix mr rosetta” and “phenix rosetta refinement” modules. We also found a way to make project specific software installations enabling frequent updates of PHENIX and DIALS i.e. MX software in rapid development. We also installed parallel MX software developed for supercomputer usage such as Arcimboldo_lite for high resolution ab-initio phasing and Shake and Bake for phasing with many heavy atoms. During 2015 we installed the parallel and leading software packages for phasing (SHARP) and refinement (BUSTER) from GlobalPhasing and discovered MRage, a parallel implementation of the leading molecular replacement software PHASER, already available in the phenix-GUI. MX software today are typically using a single compute node with maximum 32 cores (16 at Triolith), however when rosetta is used for structure refinement, for tricky molecular replacements, and to make X-ray data processing as fast as todays X-ray data collection, we would benefit from software that can use several nodes, i.e. multiples of 16 cores. For 2016 we look forward to install second version of CCP4 interface ccp4i2, more PyMol plugins, and parallel software for electrostatics and pKa predictions, Crystallography and NMR System, and potentially the Cambridge Structural Database System (CSDS) for model building of more accurate ligand protein structure complexes. We also need to establish close collaboration with Max IV/Lunarc researchers and develop A) HPC get started online documentations and B) training sessions for active and upcoming NSC users to collect MX community feedback. References 1. J. S. Brock et al., A dynamic Asp-Arg interaction is essential for catalysis in microsomal prostaglandin E2 synthase. Proceedings of the National Academy of Sciences of the United States of America 113, 972-977 (2016).