Multi-Objective Reinforcement Learning for Agent-Based Simulation
Title: Multi-Objective Reinforcement Learning for Agent-Based Simulation
DNr: Berzelius-2021-73
Project Type: LiU Berzelius
Principal Investigator: Johan Källström <johan.kallstrom@liu.se>
Affiliation: Linköpings universitet
Duration: 2021-12-01 – 2022-06-01
Classification: 10201
Homepage: https://www.researchgate.net/project/Adaptive-Air-Combat-Training-Systems-for-Competency-Based-Training
Keywords:

Abstract

Team training in complex domains often requires a substantial amount of resources, e.g., instructors, role-players and vehicles. For this reason, it may be difficult to realize efficient and effective training scenarios in a real-world setting. Instead, intelligent agents can be used to construct synthetic, simulation-based training environments. However, building behavior models for such agents is challenging, especially for the end-users of the training systems, who typically do not have expertise in artificial intelligence. In this PhD project, we study how machine learning can be used to simplify the process of constructing agents for simulation-based training. By constructing smarter synthetic agents the dependency on human training providers can be reduced, and the availability as well as the quality of training can be improved. The computation resources will be used for development of new algorithms for multi-objective reinforcement learning (MORL) in mixed cooperative-competitive multi-agent settings. MORL allows synthetic agents to learn how to prioritize among multiple, possibly conflicting objectives. The priorities among the objectives of the learning agent are defined by a utility function. An advantage of MORL compared to standard single-objective reinforcement learning algorithms is that complex non-linear utility functions can be used, which is necessary when using reinforcement learning to model the decision-making of humans. By using different combinations of reward components and utility functions, a diverse set of agents can be created, which can make training more varied and stimulating. The components of the reward vector represent the objectives of the agent, and finding optimal policies results in a multi-objective optimisation problem. The focus of the work will be on algorithms that incorporate temporal abstractions and agent modeling techniques, in order to enable agents to handle complex scenarios and interactions with humans. A framework built in previous work will be used for training the agents. With the framework it is possible to use multiple CPUs for generating data, and multiple GPUs for training the decision-making policies of the agents (which are realized as deep neural networks).