Scalable XPath Evaluation on Large-Scale Continuously Evolving XML Repositories

BigData Congress(2014)

引用 1|浏览3
Continuously Evolving XML(CEXML) documents are important for representing constantly-changing information in a number of emerging domains such as software configuration management and geographical information systems. CEXML document consists of multiple versions of an XML document as it evolves over time. Evaluating XPath expressions in large CEXML repositories is inherently challenging because of the additional temporal dimension. This paper introduces an important class of XPath queries for CEXML documents called version specific XPath expressions (VS-XPath). We present a scalable and efficient framework for VS-XPath evaluation on CEXML repositories. Our framework is a novel adaptation of the interval-based indexing scheme and it incorporates several unique features. First, we significantly reduce the index computation and storage costs by selectively indexing interspersed subsets of versions of CEXML documents. Second, we present a set of algorithms that utilize the available indices to obtain first-cut solutions of XPath queries and refine the solutions by taking into account the edits occurring between various versions. Third, we propose a unique method to drastically prune the edits that need to be processed when evaluating a XPath expression thereby providing significant performance gains. This paper also reports a detailed experimental study demonstrating the scalability and efficiency benefits of the proposed framework in terms of indexing costs, query latencies and storage costs.
XML,database indexing,query processing,CEXML documents,CEXML repositories,VS-XPath evaluation,XPath queries,constantly-changing information representation,edit pruning,index computation cost reduction,interval-based indexing scheme,large-scale continuously evolving XML repositories,performance gains,query latencies,scalable XPath evaluation,storage cost reduction,temporal dimension,version specific XPath expression evaluation,Interval based indexing,Multi-version XML,XPath Expressions
AI 理解论文
Chat Paper