Data-Driven Biodiversity Modeling
Title: Data-Driven Biodiversity Modeling
DNr: Berzelius-2025-187
Project Type: LiU Berzelius
Principal Investigator: Tobias Andermann <tobias.andermann@ebc.uu.se>
Affiliation: Uppsala universitet
Duration: 2025-06-02 – 2026-01-01
Classification: 10502
Homepage: https://www.biodiversity.se
Keywords:

Abstract

Our research group, the Biodiversity Data Lab (www.biodiversity.se), is an interdisciplinary team with backgrounds in ecology, molecular biology, bioinformatics, data science, statistics, AI, remote sensing, and spatial modeling. This provides us full control over the entire data workflow, from the collection of raw biodiversity data in the field, to their processing and statistical analyses, and the final inference of spatial biodiversity data products with several potential applications. This places our group in the new emerging interdisciplinary field of spatial biodiversity modeling, which is in high demand due to a) very timely national and international policy legislation, b) a momentum in the private sector towards accounting for biodiversity impact, and c) local scale land-use prioritization conflicts between infrastructure development and nature conservation (Fig. 1). Our modeling approach is focused on interpolating (and occasionally extrapolating) biodiversity metrics from calibration points with available biodiversity data, to other areas without such data, allowing us to produce continuous spatial maps of various biodiversity metrics. For this, we test and compare a range of different model types for each project, from linear regression to complex deep learning models. These models are informed by spatial data products that describe the environmental conditions at any given site (mostly sourced from remote sensing data). We have already implemented several proof-of-concept models, which are currently in review. We mainly work with deep learning models, usually implemented via the Pytorch Python library. As we have complex input data, inclduing multi-channel image information for thousands to tens of thousands of training instances, we have high computation resource requirement. Having access to powerful GPU resources for this purpose speeds up model training significantly and would enable us to run more efficient model training and testing.