ML Workloads Allocation for CloudRobotics WASP-NEST Continuation
Title: ML Workloads Allocation for CloudRobotics WASP-NEST Continuation
DNr: Berzelius-2025-50
Project Type: LiU Berzelius
Principal Investigator: Florian Pokorny <fpokorny@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-03-01 – 2025-09-01
Classification: 10207
Homepage: https://wasp-sweden.org/nest-project-cloud-robotics/
Keywords:

Abstract

This application is for the continued use of Project Berzelius-2024-286 (Cloud-Robotics-NEST, a continuation of project Berzelius-2024-55), which is a part of the WASP-NEST initiative titled “CloudRobotics-NEST: Intelligent Cloud Robotics for Real-Time Manipulation at Scale” (https://wasp-sweden.org/nest-project-cloud-robotics/), one of the large Network, Excellence Synergies and Teams research projects funded by WASP (https://wasp-sweden.org). As explained in more detail on the above website, our project focuses on addressing fundamental cloud robotics challenges. A core part of this endeavour involves collecting large-scale real robot datasets and developing deep neural networks for robotic manipulation, for which we would like to seek GPU and storage resources in this joint application. The project is coordinated by Assoc. Prof. Florian Pokorny, KTH in collaboration with co-PIs Prof. Erik Elmroth, Assist. Prof. Monowar Bhuyan (Umea) and Prof. Martina Maggio (Lund). The project, in particular, also employs multiple PhD students (2 KTH, 1 Umea and 1 Lund). These students will be the primary users of the requested GPU resources. Two of the current PhD students (KTH: Shutong Jin, Ruiyu Wang) are working on dataset collection and deep neural networks (DNN) for robotic manipulation, and one PhD student (Umea: Obaidullah Zaland) is targeting federated machine learning (ML) approaches for robotic manipulation in an edge-cloud setting. Two Postdocs (Umea: Antonio Seo and Chanh Nguyen) are working on resource allocation in the cloud setting. Yde Sinnema, a PhD student at Lund, is working on the study of response delay in robotic control. One research engineer, Axel Kaliff, is working on building the software infrastructure and conducting data analysis. Two project students at KTH, Ben Temming and Filip Larsson, will start working on establishing performance benchmarks and a sim-to-real pipeline on the CloudGripper system in January 2025. For the DNN tasks, the application of deep learning methods such as imitation learning and transformer networks rely heavily on the training data, which the project can generate at scale from a parallel robotic system at KTH with currently 32 robot arms. Approximately 2 TB of initial training data and initial model architectures have been created and tested, and the project is now at a stage where additional GPU compute resources are required to compete with internationally leading research institutions in this research direction.