Distributing Text Mining tasks with librAIry.

DocEng(2017)

引用 10|浏览35
暂无评分
摘要
We present librAIry, a novel architecture to store, process and analyze large collections of textual resources, integrating existing algorithms and tools into a common, distributed, high-performance workflow. Available text mining techniques can be incorporated as independent plug&play modules working in a collaborative manner into the framework. In the absence of a pre-defined flow, librAIry leverages on the aggregation of operations executed by different components in response to an emergent chain of events. Extensive use of Linked Data (LD) and Representational State Transfer (REST) principles are made to provide individually addressable resources from textual documents. We have described the architecture design and its implementation and tested its effectiveness in real-world scenarios such as collections of research papers, patents or ICT aids, with the objective of providing solutions for decision makers and experts in those domains. Major advantages of the framework and lessons-learned from these experiments are reported.
更多
查看译文
关键词
large-scale text analysis, NLP, scholarly data, text mining, data integration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要