Keyword Search in Heterogeneous Data Sources

HAL (Le Centre pour la Communication Scientifique Directe)(2020)

引用 1|浏览8
暂无评分
摘要
Data journalism is the field of investigative journalism work based first and foremost on digital data. As more and more of human activity leaves strong digital traces, data journalism is an increasingly important trend. Important journalism projects increasingly involve diverse data sources, having heterogeneous data models, different structures, or no structure at all; the Offshore Leaks is a prime example. Inspired by our collaboration with Le Monde, a leading French newspaper , we designed a novel content management architecture, together with an algorithm for exploiting such heterogeneous corpora through keyword search: given a set of search terms, find links between them within and across the different datasets which we interconnect in a graph. Our work recalls keyword search in structured and unstructured data, but data heterogeneity makes it computationally harder. We analyze the performance of our algorithm on real-life datasets.
更多
查看译文
关键词
heterogeneous data,search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要