Protein Structure Prediction using contact predictions and other tools
Title: |
Protein Structure Prediction using contact predictions and other tools |
DNr: |
SNIC 2014/8-12 |
Project Type: |
SNIC Large Compute |
Principal Investigator: |
Arne Elofsson <arne@bioinfo.se> |
Affiliation: |
Stockholms universitet |
Duration: |
2014-07-01 – 2015-07-01 |
Classification: |
10203 10601 30199 |
Homepage: |
http://bioinfo.se/ |
Keywords: |
|
Abstract
Here, we apply for resources to continue our development and
utilisation of methods for protein structure predictions. We have
during the last two years developed methods that significantly
outperform earlier contact predictions
methods. In particular the development
of the second method, PconsC2, is our mind quite innovative. Here, we
use a deep learning approach to significantly improve contact
predictions. The improvement is 50% better PPV values than our
earlier methods and almost 80% better than any other method. We have
also developed a novel method to fold protein.
The next step in this work is to start using these methods on real
world problems. We have started doing that and for a number of
examples the results seem to be successful. However, when we applied
them to more complicated proteins, protein complexes, large
multi-domain proteins, domain repeat proteins a number of problems
become obvious. In short these boil down to (i) inaccurate multiple
sequence alignments (ii) inaccurate identification of orthologous
proteins and (iii) inefficient use of phylogenetic information. We are
just starting examining the possibilities to address these problems.
My group consist of 10-15 people (depending on the number of master
students) and all of these students use heavy computing at least
partially during their projects. As you can see from our activity
logs, many months we have used our allocated resources (at least on one
of the systems) already within a few weeks. This has created severe
bottlenecks in our development. Luckily we have still had access to
ferlin and used this as an emergency system. On the other hand there
exist some months where the focus has been on writing code, analysing
results and writing paper and we have not used all of our work. This
is a natural part of a research group that is focused on methods
development.
Given that on average perhaps 5 students are developing methods, and
for efficient turnaround they need to have access to about 10 nodes
each 24/7, we request in the order of 500k core hours.