An adaptive framework for searching xml documents

An adaptive framework for searching xml documents(2007)

引用 23|浏览7
暂无评分
摘要
The evolution of computing technology suggests that it has become more feasible to offer access to Web information in a ubiquitous way, through various kinds of inter-action devices such as PCs, laptops, palmtops, and so on. As XML has become a defacto standard for exchanging Web data, an interesting and practical research problem is the development of models and techniques to satisfy various needs and preferences in searching XML data. In this thesis, we employ a list of simple XML tagged keywords as a vehicle for searching XML fragments in a collection of XML documents. In order to deal with the diversified nature of XML documents as well as user preferences, we propose a novel Multi-Ranker Model (MRM), which is able to abstract a spectrum of important XML properties and adapt the features to different XML search needs.The MRM is composed of three ranking levels. The lowest level consists of two categories of similarity and granularity features. At the intermediate level, we define four tailored XML Rankers (XRs), which consist of different lower level features and have different strengths in searching XML fragments. The XRs are trained via a learning mechanism called the Ranking Support VectorMachine in a voting Spy Naïve Bayes Framework (RSSF). The RSSF takes as input a set of labeled fragments and feature vectors and generates as output Adaptive Rankers (ARs) in the learning process. The ARs are defined over the XRs and generated at the top level of the MRM. We show empirically that the RSSF is able to improve the MRM significantly in the learning process and needs only a small set of training XML fragments. We demonstrate that the trained MRM is able to bring out the strengths of the XRs in order to adapt different preferences and queries. We also present the Adaptive Information Merging Approach (AIM) to merge the XML fragments returned from the ranked result list. We incorporate the users’ feed-back in order to further improve the coverage and specificity of the merged results, which are measured in terms of two formal notions of Information Completeness (IC) and Data Complexity (DC). IC represents source coverage and computes the “completeness” of the involved information sources and DC represents the “richness” of data and computes the complexity of the retrieved data items.
更多
查看译文
关键词
xml document,Web data,adaptive framework,trained MRM,training XML fragment,tailored XML Rankers,XML data,simple XML,important XML property,different XML search need,XML document,XML fragment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要