DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection

Abstract

A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network (FCNN) with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency model takes a data-driven strategy for encoding the underlying saliency prior information, and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation. Through collaborative feature learning from such two correlated tasks, the shared fully convolutional layers produce effective features for object perception. Moreover, it is capable of capturing the semantic information on salient objects across different levels using the fully convolutional layers, which investigate the feature-sharing properties of salient object detection with great feature redundancy reduction. Finally, we present a graph Laplacian regularized nonlinear regression model for saliency refinement. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches

Fig. 1: Illustration of our approach for salient object detection. First, a fully convolutional neural network takes the whole image as input and predicts the saliency map by capturing the semantic information on salient objects across different levels. Second, a Laplacian regularized nonlinear regression scheme based on the super-pixel graph is used to produce a fine-grained boundary preserving saliency map.

Publication:

Xi Li, Liming Zhao*, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, Jingdong Wang. "DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection“. IEEE Transactions on Image Processing (TIP), vol. 25, no. 8, pp. 3919-3930, Aug. 2016. [paper] [arXiv] [code] [models] [result maps] [datasets]

(Researchers in China can download our results and datasets from baidu disk with password: glkb​)

*corresponding author: Liming Zhao (zhaoliming@zju.edu.cn)

@article{DeepSaliency, title={DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection}, author={Xi Li and Liming Zhao and Lina Wei and Ming-Hsuan Yang and Fei Wu and Yueting Zhuang and Haibin Ling and Jingdong Wang}, journal={IEEE Transactions on Image Processing}, year={2016}, volume={25}, number={8}, pages={3919 - 3930}, doi={10.1109/TIP.2016.2579306}, ISSN={1057-7149}, month={Aug}, }

Network:

Fig. 2: Architecture of the proposed fully convolutional neural network for training. The FCNN scheme carries out the task of saliency detection in conjunction with the task of object class segmentation, which share a convolution part with 15 layers. The segmentation task seeks for the intrinsic object semantic information of the image, while the saliency task aims to find the salient objects. For the saliency detection task, we only use the saliency-related network for testing.

Experiments:

Fig. 3: Precision-recall curves of different saliency detection methods on 8 benchmark datasets. Overall, the proposed approach performs well with higher precision in the case of a fixed recall.

Fig. 4: Comparison of average F-measure using adaptive threshold (aveF), maximum F-measure of average precision recall curve (maxF), AUC scores and MAE scores (smaller better). Our approach (OurLO) achieves the best performance in all these metrics. The results from the last three columns are directly quoted from their original papers. Since using THUS for training, the saliency results of OurBR are null.

Fig. 5: Qualitative comparison of different approaches on several challenging samples with ground truth (GT). Clearly, our approach obtains more visually feasible saliency detection results than the comparison approaches.