Back to Homepage

Object-based Video Co-segmentation


We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set. The use of object elements overcomes limitations of low-level feature representations in separating complex foregrounds and backgrounds. We formulate object-based co-segmentation as a co-selection graph in which regions with foreground-like characteristics are favored while also accounting for intra-video and inter-video foreground coherence. To handle multiple foreground objects, we expand the co-selection graph model into a proposed multi-state selection graph model (MSG) that optimizes the segmentations of different objects jointly. This extension into the MSG can be applied not only to our co-selection graph, but also can be used to turn any standard graph model into a multi-state selection solution that can be optimized directly by the existing energy minimization techniques. Our experiments show that our object-based multiple foreground video co-segmentation method (ObMiC) compares well to related techniques on both single and multiple foreground cases.


There are two datasets used in our paper: MOViCS dataset and our Video Coseg dataset.

1. CVPR/TIP on Video Coseg dataset:
Dog Person Monster Skating Avg.
Acc 1115 9321 3551 3274 4315
IOU 0.753 0.542 0.795 0.666 0.689
2. CVPR on MOViCS dataset (single object):
Chicken Giraffe Lion Tiger Avg.
Acc 1567 2938 1598 21005 6726
IOU 0.872 0.668 0.828 0.714 0.771
3. TIP on MOViCS dataset (multi-object):
Chicken/Turtle Giraffe/Elephant Lion/Zebra Tiger Avg.
Acc 2372 3396 6084 21005 8214
IOU 0.879 0.553 0.616 0.714 0.691


[1] "Object-based Multiple Foreground Video Co-segmentation"
Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin,
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3166-3173.

[2] "Object-based Multiple Foreground Video Co-segmentation via Multi-state Selection Graph"
Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin, Rabab K. Ward,
IEEE Transactions on Image Processing (TIP), vol. 24, no. 11, pp. 3415-3424, 2015.

Dataset and Code:

The code can be found from here: [Code]
Our Dataset and Groundtruth (~5MB) has 8 videos (2 video in each group) including 2 objects in each video. Download: [OneDrive] [BaiduYun]
Other related video co-segmentation dataset: MOViCS (CVPR13) [Project Link] .

Related Works:

[1] Huazhu Fu, Xiaochun Cao, Zhuowen Tu, "Cluster-based Co-saliency Detection", IEEE Transactions on Image Processing (TIP), vol. 22, no. 10, pp. 3766-3778, 2013. [PDF] [Code]
[2] Xiaochun Cao, Zhiqiang Tao, Bao Zhang, Huazhu Fu, Wei Feng, "Self-adaptively Weighted Co-saliency Detection via Rank Constraint", IEEE Transactions on Image Processing (TIP), vol. 23, no. 9, pp. 4175-4186, 2014. [PDF] [Code]
[3] Huazhu Fu, Dong Xu, Stephen Lin, Jiang Liu, "Object-based RGBD Image Co-segmentation with Mutex Constraint", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4428-4436. [PDF] [Project]