## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# DisC diversity: result diversification based on dissimilarity and coverage

PVLDB, no. 1 (2013): 13-24

EI

Full Text

Weibo

Keywords

Abstract

Recently, result diversification has attracted a lot of attention as a means to improve the quality of results retrieved by user queries. In this paper, we propose a new, intuitive definition of diversity called DisC diversity. A DisC diverse subset of a query result contains objects such that each object in the result is represented by a...More

Code:

Data:

Introduction

- Result diversification has attracted considerable attention as a means of enhancing the quality of query results presented to users (e.g.,[26, 32]).
- Given P, the authors select a representative subset S ⊆ P to be presented to the user such that: (i) all objects in P are similar with at least one object in S and (ii) no two objects in S are similar with each other.
- The first condition ensures that all objects in P are represented, or covered, by at least one object in the selected subset.
- The second condition ensures that the selected objects of P are dissimilar.
- The authors call the set S r-Dissimilar and Covering subset or r-DisC diverse subset

Highlights

- Result diversification has attracted considerable attention as a means of enhancing the quality of query results presented to users (e.g.,[26, 32])
- We provide theoretical upper bounds for the size of the diverse subsets produced by our algorithms for computing DisC diverse subsets as well as for their zooming counterparts
- Rather than providing polynomial approximation bounds for DisC diversity, we focus on the efficient computation of non-minimum but small DisC diverse subsets
- We proposed a novel, intuitive definition of diversity as the problem of selecting a minimum representative subset S of a result P, such that each object in P is represented by a similar object in S and that the objects included in S are not similar to each other
- Since locating minimum r-DisC diverse subsets is an NP-hard problem, we introduced heuristics for computing approximate solutions, including incremental ones for zooming, and provided corresponding theoretical bounds
- We presented an efficient implementation based on spatial indexing

Results

**Results from Graph Theory**

The properties of independent and dominating subsets have been extensively studied.- The Minimum Independent Dominating Set Problem has been shown to have some of the strongest negative approximation results: in the general case, it cannot be approximated in polynomial time within a factor of n1−ǫ for any ǫ > 0 unless P = N P [17].
- Rather than providing polynomial approximation bounds for DisC diversity, the authors focus on the efficient computation of non-minimum but small DisC diverse subsets.
- Allowing the dominating set to be connected has an impact on the complexity of the problem and allows different algorithms to be designed

Conclusion

**SUMMARY AND FUTURE WORK**

In this paper, the authors proposed a novel, intuitive definition of diversity as the problem of selecting a minimum representative subset S of a result P, such that each object in P is represented by a similar object in S and that the objects included in S are not similar to each other.- Similarity is modeled by a radius r around each object.
- The authors call such subsets r-DisC diverse subsets of P.
- Since locating minimum r-DisC diverse subsets is an NP-hard problem, the authors introduced heuristics for computing approximate solutions, including incremental ones for zooming, and provided corresponding theoretical bounds.
- The authors presented an efficient implementation based on spatial indexing

- Table1: Input parameters
- Table2: Algorithms
- Table3: Solution size. (a) Uniform (2D - 10000 objects)

Related work

- Other Diversity Definitions: Diversity has recently attracted a lot of attention as a means of enhancing user satisfaction [27, 4, 16, 6]. Diverse results have been defined in Jaccard Distance

0 0.06 0.05 0.04 0.03 0.02 0.01 radius Greedy−DisC (r) − Greedy−DisC (r’) Greedy−DisC (r) − Basic−Zoom−In (r’) Greedy−DisC (r) − Greedy−Zoom−In (r’)

0 0.01 0.0075 0.005 0.0025 0.001 radius Greedy−DisC (r) − Greedy−DisC (r’) Greedy−DisC (r) − Basic−Zoom−In (r’) Greedy−DisC (r) − Greedy−Zoom−In (r’) Greedy−DisC Basic−Zoom−Out

Greedy−Zoom−Out (a)

Greedy−Zoom−Out (b)

Greedy−Zoom−Out (c)

0 0.02 0.03 0.04 0.05 0.06 0.07 radius

Greedy−DisC Basic−Zoom−Out Greedy−Zoom−Out (a) Greedy−Zoom−Out (b) Greedy−Zoom−Out (c)

Funding

- Drosou was supported by the ESF and Greek national funds through the NSRF - Research Funding Program: “Heraclitus II”
- Pitoura was supported by the project “InterSocial” financed by the European Territorial Cooperation Operational Program “Greece - Italy” 2007-2013, co-funded by the ERDF and national funds of Greece and Italy

Reference

- Acme digital cameras database. http://acme.com/digicams.
- Greek cities dataset. http://www.rtreeportal.org.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, 2009.
- A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD, 2011.
- R. Boim, T. Milo, and S. Novgorodov. Diversification and refinement in collaborative filtering recommender. In CIKM, 2011.
- A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversifcation, monotone submodular functions and dynamic updates. In PODS, 2012.
- M. Chlebık and J. Chlebıkova. Approximation hardness of dominating set problems in bounded degree graphs. Inf. Comput., 206(11), 2008.
- B. N. Clark, C. J. Colbourn, and D. S. Johnson. Unit disk graphs. Discrete Mathematics, 86(1-3), 1990.
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.
- M. Drosou and E. Pitoura. Search result diversification. SIGMOD Record, 39(1), 2010.
- M. Drosou and E. Pitoura. DisC diversity: Result diversification based on dissimilarity and coverage, Technical Report. University of Ioannina, 2012.
- M. Drosou and E. Pitoura. Dynamic diversification of continuous data. In EDBT, 2012.
- E. Erkut, Y. Ulkusal, and O. Yenicerioglu. A comparison of p-dispersion heuristics. Computers & OR, 21(10), 1994.
- P. Fraternali, D. Martinenghi, and M. Tagliasacchi. Top-k bounded diversification. In SIGMOD, 2012.
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, 2009.
- M. M. Halldorsson. Approximating the minimum maximal independence number. Inf. Process. Lett., 46(4), 1993.
- A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. In PAKDD, 2004.
- B. Liu and H. V. Jagadish. Using trees to depict a forest. PVLDB, 2(1), 2009.
- E. Minack, W. Siberski, and W. Nejdl. Incremental diversification for very large sets: a streaming-based approach. In SIGIR, 2011.
- D. Panigrahi, A. D. Sarma, G. Aggarwal, and A. Tomkins. Online selection of diverse results. In WSDM, 2012.
- Y. Tao, L. Ding, X. Lin, and J. Pei. Distance-based representative skyline. In ICDE, pages 892–903, 2009.
- M. T. Thai, F. Wang, D. Liu, S. Zhu, and D.-Z. Du. Connected dominating sets in wireless networks with different transmission ranges. IEEE Trans. Mob. Comput., 6(7), 2007.
- M. T. Thai, N. Zhang, R. Tiwari, and X. Xu. On approximation algorithms of k-connected m-dominating sets in disk graphs. Theor. Comput. Sci., 385(1-3), 2007.
- C. J. Traina, A. J. M. Traina, C. Faloutsos, and B. Seeger. Fast indexing and visualization of metric data sets using slim-trees. IEEE Trans. Knowl. Data Eng., 14(2), 2002.
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, 2008.
- M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In ICDE, 2011.
- K. Xing, W. Cheng, E. K. Park, and S. Rotenstreich. Distributed connected dominating set construction in geometric k-disk graphs. In ICDCS, 2008.
- C. Yu, L. V. S. Lakshmanan, and S. Amer-Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, 2009.
- P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach. Springer, 2006.
- M. Zhang and N. Hurley. Avoiding monotony: improving the diversity of recommendation lists. In RecSys, 2008.
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn