A pilot macromolecular 3D structure determination project - Year 2
In March 2013 Protein Science Facility (PSF) from SciLifeLab in Stockholm and National Supercomputer Centre (NSC) in Linköping started a pilot project to evaluate protein crystallography software applications running at Triolith. Our initial ambition was to install parallel data processing software named XDS used for all protein crystallography datasets collected today and to install supercomputer adapted software such as Arcimboldo and Shake and Bake not available to the Swedish Structural Biology community today. We also wanted to evaluate how to run remote graphic applications for model building such as COOT and visualisation such as PyMol and VASCo.
The initial testing of installed software http://docs.snic.se/wiki/Category:Structural_biology came out very favorable and we concluded that:
1. XDS parallel data processing is running very fast at NSC Triolith compared to running at PSF.
2. NSC thinlinc software is suitable for running remote graphics applications such as COOT and PyMol.
3. NSC has installed complementary software to PSF such as “Shake and Bake” a supercomputer adapted software for many heavy atoms phasing and “Phenix MR Rosetta” – a modern molecular replacement and autobuilding suite that contains a sbatch scheduler runnable at NSC Triolith.
The purpose of this pilot is to build momentum within the protein crystallography community to engage in supporting the PReSTO application currently headed by NSC. PReSTO stands for rapid protein structure determination in an integrated e-Science environment for structure calculations and data storage. PReSTO is currently supported by SwedStruct representing the protein crystallography community (swedstruct.mbb.ki.se) and Global Phasing (www.globalphasing.com) where its founder Gerhard Bricogne was awarded the Gregori Aminoff prize in 2009 for “development of revolutionary programme systems” by the Royal Swedish Academy of Science. If PReSTO is granted the following could be introduced to the protein crystallography community:
1. Adding Arcimboldo supercomputer software written in Condor different from Slurm used at NSC. Arcimboldo can be described as “fragment based” molecular replacement and is suitable when high resolution dataset exists but no phases available due to crystallization reproducibility issues etc. Also GlobalPhasing is highly interested in developing supercomputer adapted versions of their leading software packages BUSTER and SHARP.
2. Develop dedicated Triolith nodes for protein crystallography for interactive work where computing and manual model building is altered.
3 Support the entire protein crystallography workflow so that NSC users don’t need to jump in between computers when doing data processing, phasing and refinement calculations, performs manual model building or create publication pictures. This requires complete software stack for instance more PyMol plugins such as VASCo and ideally Triolith graphics card update for improved remote graphics applications experience.
The current NSC pilot project has been joined by nine different research groups and PIs and the majority originates at Karolinska Institutet where the author of this application is active. Already today, the NSC protein crystallography setup is a true asset to everyone testing it and therefore we want to continue with the pilot and hopefully involve our users to develop the current setup as far as our common knowledge, hardware, software and economy permits.