A robust approximate algorithm for large-scale  clustering of visual features

LI Darui; YANG Linjun; HUA Xiansheng; ZHANG Hongjiang

doi:10.3969/j.issn.0253-2778.2014.10.008

PDF( 1618 KB)

Open Access JUSTC Original Paper

A robust approximate algorithm for large-scale clustering of visual features

1.
MOE-Microsoft Key Lab, USTC, Hefei 230027, China
2.
Microsoft Bing, Bellevue 98004, USA
3.
Microsoft Research, Redmond 98052, USA
4.
Kingsoft,Beijing 100085, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2014.10.008

Received Date: 13 September 2013
Accepted Date: 16 May 2014
Rev Recd Date: 16 May 2014
Publish Date: 30 October 2014

Abstract Full text PDF

Abstract

Abstract

The large-scale clustering problem of visual features is crucial for image recognition and retrieval. The state-of-the-art algorithm, the approximate k-means, approximately guarantees the clustering performance by applying the high-precision approximate search. An improved algorithm was proposed, which requires no extra memory cost and nearly no extra time consumption. The robust approximate algorithm can better guarantee its convergence and clustering performance by utilizing more information in the iteration to update the partition, so that clustering loss is non-increasing and reduced rapidly. Theoretical proofs guarantee that the algorithm converges to the converged solution of the Lloyd algorithm, regard less of the precision values of approximate search. The experiment results show that the algorithm has about 10 times the speed of the approximate k-means algorithm. Besides, the clustering performance is also directly verified by comparing the images in the clustering results of global features.

Abstract

The large-scale clustering problem of visual features is crucial for image recognition and retrieval. The state-of-the-art algorithm, the approximate k-means, approximately guarantees the clustering performance by applying the high-precision approximate search. An improved algorithm was proposed, which requires no extra memory cost and nearly no extra time consumption. The robust approximate algorithm can better guarantee its convergence and clustering performance by utilizing more information in the iteration to update the partition, so that clustering loss is non-increasing and reduced rapidly. Theoretical proofs guarantee that the algorithm converges to the converged solution of the Lloyd algorithm, regard less of the precision values of approximate search. The experiment results show that the algorithm has about 10 times the speed of the approximate k-means algorithm. Besides, the clustering performance is also directly verified by comparing the images in the clustering results of global features.

FullText(HTML)

References(14)

References

[1]	Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE Press, 2006, 2: 2 161-2 168.
[2]	Philbin J, Chum O, Isard M, et al. Object retrieval with large vocabularies and fast spatial matching[C]// IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE Press, 2007: 1-8.
[3]	Tuytelaars T, Lampert C H, Blaschko M B, et al. Unsupervised object discovery: A comparison[J]. International Journal of Computer Vision, 2010, 88(2): 284-302.
[4]	Philbin J, Zisserman A. Object mining using a matching graph on very large image collections[C]//Proceedings of the Inidan Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India: IEEE Press, 2008: 738-745.
[5]	Li X W, Wu C C, Zach C, et al. Modeling and recognition of landmark image collections using iconic scene graphs[C]// Proceedings of 10th European Conference on Computer Vision. Marseille, France: Springer, 2008: 427-440.
[6]	Torralba A, Fergus R, Freeman W T. 80 million tiny images: A large data set for nonparametric object and scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1 958-1 970.
[7]	Lloyd S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982,28(2):129-136.
[8]	Selim S Z, Ismail M A. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 6(1): 81-87.
[9]	Jain A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010,31(8):651-666.
[10]	Judd D, McKinley P K, Jain A K. Large-scale parallel data clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(8): 871-876.
[11]	Kollios G, Gunopulos D, Koudas N, et al. Efficient biased sampling for approximate clustering and outlier detection in large data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(5): 1 170-1 187.
[12]	Silpa-Anan C, Hartley R. Optimised KD-trees for fast image descriptor matching[C]// IEEE Conference on Computer Vision and Pattern Recognition. Anchorage,USA: IEEE Press, 2008: 1-8.
[13]	Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration[C]// Proceedings of the 4th International Conference on Computer Vision Theory and Applications. Lisboa, Portugal: IEEE Press, 2009: 331-340.
[14]	Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]// Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms. Portland, USA: ACM Press, 2007: 1 027-1 035.

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE Press, 2006, 2: 2 161-2 168.
[2]	Philbin J, Chum O, Isard M, et al. Object retrieval with large vocabularies and fast spatial matching[C]// IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE Press, 2007: 1-8.
[3]	Tuytelaars T, Lampert C H, Blaschko M B, et al. Unsupervised object discovery: A comparison[J]. International Journal of Computer Vision, 2010, 88(2): 284-302.
[4]	Philbin J, Zisserman A. Object mining using a matching graph on very large image collections[C]//Proceedings of the Inidan Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India: IEEE Press, 2008: 738-745.
[5]	Li X W, Wu C C, Zach C, et al. Modeling and recognition of landmark image collections using iconic scene graphs[C]// Proceedings of 10th European Conference on Computer Vision. Marseille, France: Springer, 2008: 427-440.
[6]	Torralba A, Fergus R, Freeman W T. 80 million tiny images: A large data set for nonparametric object and scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1 958-1 970.
[7]	Lloyd S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982,28(2):129-136.
[8]	Selim S Z, Ismail M A. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 6(1): 81-87.
[9]	Jain A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010,31(8):651-666.
[10]	Judd D, McKinley P K, Jain A K. Large-scale parallel data clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(8): 871-876.
[11]	Kollios G, Gunopulos D, Koudas N, et al. Efficient biased sampling for approximate clustering and outlier detection in large data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(5): 1 170-1 187.
[12]	Silpa-Anan C, Hartley R. Optimised KD-trees for fast image descriptor matching[C]// IEEE Conference on Computer Vision and Pattern Recognition. Anchorage,USA: IEEE Press, 2008: 1-8.
[13]	Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration[C]// Proceedings of the 4th International Conference on Computer Vision Theory and Applications. Lisboa, Portugal: IEEE Press, 2009: 331-340.
[14]	Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]// Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms. Portland, USA: ACM Press, 2007: 1 027-1 035.

TrendMD

Volume 44 Issue 10 page: 844-852

Cover

Keywords

Article Metrics

Article views (25) PDF downloads(70)

A robust approximate algorithm for large-scale clustering of visual features

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

A robust approximate algorithm for large-scale clustering of visual features

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content