SAN: Scene Anchor Networks for Joint Action-Space Prediction
In this work, we present a novel multi-modal trajectory prediction architecture. We decompose the uncertainty of future trajectories along higher-level scene characteristics and lower-level motion characteristics, and model multi-modality along both dimensions separately. The scene uncertainty is captured in a joint manner, where diversity of scene modes is ensured by training multiple separate anchor networks which specialize to different scene realizations. At the same time, each network outputs multiple trajectories that cover smaller deviations given a scene mode, thus capturing motion modes. In addition, we train our architectures with an outlier-robust regression loss function, which offers a trade-off between the outlier-sensitive L2 and outlier-insensitive L1 losses. Our scene anchor model achieves improvements over the state of the art on the INTERACTION dataset, outperforming the StarNet architecture from our previous work.