Cross-Domain Multi-task Learning for Object Detection and Saliency Estimation

Abstract

Multi-task learning (MTL) is a learning paradigm that aims at joint optimization of multiple tasks using a single neural network for better performance and generalization. In practice, MTL rests on the inherent assumption of availability of common datasets with ground truth labels for each of the downstream tasks. However, collecting such a common annotated dataset is laborious for complex computer vision tasks such as the saliency estimation which would require the eye fixation points as the ground truth data. To this end, we propose a novel MTL framework in the absence of common annotated dataset for joint estimation of important downstream tasks in computer vision - object detection and saliency estimation. Unlike many state-of-the-art methods, that rely on common annotated datasets for training, we consider the annotations from different datasets for jointly training different tasks, calling this setting as cross-domain MTL. We adapt MUTAN framework to fuse features from different datasets to learn domain invariant features capturing the relatedness of different tasks. We demonstrate the improvement in the performance and generalizability of our MTL architecture. We also show that the proposed MTL network offers a 13% reduction in memory footprint due to parameter sharing between the related tasks.

Publication
Workshop on Continual Learning in Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition, 2021.