Population Synthesis with Incomplete Information
Title: Population Synthesis with Incomplete Information
DNr: Berzelius-2024-154
Project Type: LiU Berzelius
Principal Investigator: Anders Karlström <anders.karlstrom@abe.kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-04-30 – 2024-11-01
Classification: 20105
Keywords:

Abstract

Goals: In this project we introduce a novel population synthesis model based on the Wasserstein generative-adversarial network (WGAN) that can be trained on microsamples consisting of missing information. This feature is especially useful when micro-samples have incomplete attribute data due to errors in data collection, privacy concerns resulting in data being withheld, or errors due to merging of multiple datasets. We conduct a comparison between models that are trained using complete data and models that are trained using data with different levels of missing information. The models are evaluated at the attribute-level and higher k-dimensional level to assess their capability in generating sampling and structural zeros. Importance of project: Applied transport and sustainability urban planning, it is essential to have synthetic population. The project makes a substantial contribution to the field by providing a solution for population synthesis using incomplete data. The project will explore new opportunities for future investigation, emphasizing the capacity of deep generative models to enhance the abilities of population synthesis, which is essential for agent-based models (ABMs) employed in transportation simulations and other fields. Expected goal fulfilment at the end of the project period: The goal is to validate the proposed methodology that allows the WGAN model to train from datasets that contain missing attributes. The training method will be validated using data from the Swedish national travel survey. We will conduct a comparison between models that were trained using complete data and models that were trained using data with different levels of missing information. Software and methods to be used: The complete project is written in Python. We will be using Pytorch to create and train the WGAN network. Other major libraries to be use in project - Pandas, Sklearn, Plotly (for visualization), SDMetrics.