Sparse and large-scale flexible matrix integration
||Sparse and large-scale flexible matrix integration|
||NAISS Small Compute|
||Felix Held <email@example.com>|
||2023-02-23 – 2024-03-01|
In recent years, joint analysis of data sources has become an increasingly important research topic. The problem appears in a biological context in the integrated analysis of multi-omics data but also in applications of user recommendation systems, which are used in many online shops and streaming services.
Data integration typically appears in multi-view (one cohort, multiple data types) or grid (multiple cohorts, multiple data types) layouts. We propose a method allowing for flexible layouts, not requiring adherence to a grid layout, and, in addition, supporting the integration between any pairs of cohorts or data types, as well.
Available methods typically find a joint signal prevalent in all data sources and individual signal specific to each input. Recently, uncovering information shared between only a selection of data sources has been considered as well. However, this has mostly been explored in the multi-view setting. Our method extends existing frameworks to support partially shared signals in the flexible layouts described above.
Matrix factorization under a low-rank assumption on the input matrices has become a standard approach in the field and allows for convenient representation of the estimated model. Using different strategies we use optimization, geometric and probabilistic techniques to uncover factors describing the signal in each data source and how it is shared among cohorts and data types. In some of our approaches, sparsity in estimated factors is enforced to improve their interpretability.