Incremental Structural Model For Extracting Relevant Tokens Of Entity

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)(2016)

引用 2|浏览4
暂无评分
摘要
This paper describes a method for extracting relevant tokens of entity from semi-structured administrative documents. This method is used for mislabeling correction by employing the entity tokens physically close in a document. Firstly, the entities are labeled. Secondly, each entity is modeled by a tokens structure graph in which the nodes represent the tokens and the arcs represent the distances. A clustering algorithm is then applied to incrementally concatenate the relevant tokens of entities and ignore the noisy parts. The obtained results with a dataset of real invoices are reported in experimental section.
更多
查看译文
关键词
Extracting relevant tokens,mislabeling correction,tokens structure graph,clustering algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要