Modelling Variability in Gene Dosage
Title: |
Modelling Variability in Gene Dosage |
DNr: |
Berzelius-2025-176 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Philipp Rentzsch <rentzsch@kth.se> |
Affiliation: |
Kungliga Tekniska högskolan |
Duration: |
2025-05-23 – 2025-12-01 |
Classification: |
10203 |
Homepage: |
https://www.tllab.org/research |
Keywords: |
|
Abstract
Genes are not always expressed at one uniform dosage level. Even in healthy individuals, gene expression exhibits variability over time and between different individuals, and 90% of GWAS variants are found in the non-coding genome, suggesting a critical role for gene dosage in phenotype and disease. A multitude of studies have demonstrated that this variability in gene dosage is highly gene-specific. However, dosage variability at the individual gene level is thus far poorly understood. Furthermore, while the relationship between a given gene’s dosage and its effect on function is known to be non-linear, the shape and determinants of the function are poorly understood. The objective of this project is to comprehend these two phenomena on a genomic scale.
In the first part of the project, we are aiming to leverage existing functional genomics data to model nonlinear dosage-response functions. For this we have developed a Bayesian approach to modelling Dosage Response Effects Across Modalities (bayesDREAM), which, using Stochastic Variational Inference as implemented in Pyro, corrects batch effects, uses a negative binomial model and groups information to estimate cis gene dosage, and then flexibly models the functional response. The modelled functional responses are diverse (e.g. response gene expression, response isoform usage, or even cellular fitness/differentiation state).
Thus far we have developed the method on a small-scale scRNA-seq dataset where 4 genes are permuted at varying gene dosages. In this dataset, targeted amplification of 92 putative response genes was performed. BayesDREAM shows good precision and accuracy on this dataset. We have further demonstrated that this method transfers to a similar scRNA-seq dataset where the full transcriptome is profiled. However, fitting bayesDREAM on this larger dataset requires large scale parallel computation. Future directions will involve running bayesDREAM on a genome-wide scRNA-seq CRISPRi screen, to estimate genome-wide gene-dosage response curves.
The past year has seen a breakthrough in the ability of deep learning models to predict epigenomic and transcriptomic data based on sequence. This breakthrough unlocks the ability to generate hypotheses based on in silico experiments. Therefore, the second part of the project entails the evaluation of existing deep learning models of gene expression predictions with regard to their efficacy in assessing inter-individual gene expression variance.
The objective of this study is to make a comparison between sequences with in-silico added variants for the overall variability exerted, based on population scale variant frequency information. The predicted expression variabilities are then to be compared to existing measures of variability based on haplotype expression and eQTLs. If the established metric demonstrates a high degree of similarity, the novel method possesses the potential to be highly advantageous for short or lowly expressed genes, for which existing methods are inadequate in measuring variability with precision. The development of a refined metric of gene expression variability would further facilitate a series of subsequent analyses.