Protein Structure Prediction using contact predictions and other tools
Title: Protein Structure Prediction using contact predictions and other tools
DNr: SNIC 2014/8-12
Project Type: SNIC Large Compute
Principal Investigator: Arne Elofsson <arne@bioinfo.se>
Affiliation: Stockholms universitet
Duration: 2014-07-01 – 2015-07-01
Classification: 10203 10601 30199
Homepage: http://bioinfo.se/
Keywords:

Abstract

Here, we apply for resources to continue our development and utilisation of methods for protein structure predictions. We have during the last two years developed methods that significantly outperform earlier contact predictions methods. In particular the development of the second method, PconsC2, is our mind quite innovative. Here, we use a deep learning approach to significantly improve contact predictions. The improvement is 50% better PPV values than our earlier methods and almost 80% better than any other method. We have also developed a novel method to fold protein. The next step in this work is to start using these methods on real world problems. We have started doing that and for a number of examples the results seem to be successful. However, when we applied them to more complicated proteins, protein complexes, large multi-domain proteins, domain repeat proteins a number of problems become obvious. In short these boil down to (i) inaccurate multiple sequence alignments (ii) inaccurate identification of orthologous proteins and (iii) inefficient use of phylogenetic information. We are just starting examining the possibilities to address these problems. My group consist of 10-15 people (depending on the number of master students) and all of these students use heavy computing at least partially during their projects. As you can see from our activity logs, many months we have used our allocated resources (at least on one of the systems) already within a few weeks. This has created severe bottlenecks in our development. Luckily we have still had access to ferlin and used this as an emergency system. On the other hand there exist some months where the focus has been on writing code, analysing results and writing paper and we have not used all of our work. This is a natural part of a research group that is focused on methods development. Given that on average perhaps 5 students are developing methods, and for efficient turnaround they need to have access to about 10 nodes each 24/7, we request in the order of 500k core hours.