Feature Representation Learning for Trajectory Clustering and Motion Estimation
Title: Feature Representation Learning for Trajectory Clustering and Motion Estimation
DNr: Berzelius-2024-162
Project Type: LiU Berzelius
Principal Investigator: Yaroslava Lochman <lochman@chalmers.se>
Affiliation: Chalmers tekniska högskola
Duration: 2024-04-22 – 2024-11-01
Classification: 10207
Homepage: https://ylochman.github.io/trajectory-embedding


Background: Clustering multiple motions from observed point trajectories is a fundamental task in understanding dynamic scenes. Most motion models require multiple tracks to estimate their parameters, hence identifying clusters when multiple motions are observed is a very challenging task. This is even aggravated for high-dimensional motion models. Description: In this project, we are developing a feature representation learning approach to solve a problem of simultaneous trajectory clustering and motion estimation. The starting point of our work was based on the findings that motion models are unlikely to intersect in the high-dimensional space we are working with, i.e. sufficiently long trajectories identify the underlying motion uniquely in practice. We proposed to learn a direct mapping from trajectories to embedding vectors that represent the generating motion. The approach required weak supervision in the form of cluster assignments and hence used limited data. We aim to relax the need for supervision to be able to use at least 250M+ more trajectories available without cluster assignments. This will in turn require a bigger model and more time to train it. Goal: We aim to pre-train a general DNN-based model that can: (1) extract feature representations of individual trajectories, suitable for clustering and subsequent motion estimation, and/or (2) directly predict motion parameters and cluster assignments given a sequence of trajectories, and/or (3) detect and fix outlying points, fill-in missing points, and predict next point locations in the tracks, and (4) detect an out-of-distribution input so that a (likely) more reliable, but much slower, traditional subspace clustering approach can be performed. Impact: The results of this project will be submitted to a CV/ML conference (e.g., NeurIPS, CVPR, 3DV). Upon meeting the aforementioned goals, we hope to push forward the limits of 3D dynamic scene understanding, which would hopefully contribute to the development of the vision-aided autonomous systems. Software: The methods are implemented in PyTorch.