As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework. However, most existing keypoint trackers are incapable of effectively modeling and balancing the following three aspects in a simultaneous manner: temporal model coherence across frames, spatial model consistency within frames, and discriminative feature construction. To address this issue, we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by discriminative metric learning. Consequently, temporal model coherence is characterized by multi-task structured keypoint model learning over several adjacent frames, while spatial model consistency is modeled by solving a geometric verification based structured learning problem. Discriminative feature construction is enabled by metric learning to ensure the intra-class compactness and inter-class separability. Finally, the above three modules are simultaneously optimized in a joint learning scheme. Experimental results have demonstrated the effectiveness of our tracker.
In this paper, we propose an end-to-end deep correspondence structure learning (DCSL) approach to address the cross-camera person-matching problem in the person re-identification task. The proposed DCSL approach captures the intrinsic structural information on persons by learning a semantics aware image representation based on convolutional neural networks, which adaptively learns discriminative features for person identification. Furthermore, the proposed DCSL approach seeks to adaptively learn a hierarchical data-driven feature matching function which outputs the matching correspondence results between the learned semantics-aware image representations for a person pair. Finally, we set up a unified end-to-end deep learning scheme to jointly optimize the processes of semantics-aware image representation learning and cross-person correspondence structure learning, leading to more reliable and robust person re-identification results in complicated scenarios. Experimental results on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art approaches.