A Multi-topic Meta-classification Scheme for Analyzing Lobbying Disclosure Data

Information Reuse and Integration(2015)

引用 1|浏览21
暂无评分
摘要
For the functioning of American democracy, the Lobbying Disclosure Act (LDA), for the very first time, provides data to empirically research interest groups behaviors and their influence on congressional policymaking. One of the main research challenges is to automatically find the topic(s), by short & sparse text classification, in a large corpus of unorganized, semi-structured, and poorly connected lobbying filings to reveal the underlying purpose(s) of these lobbying activities. Common techniques for alleviating data sparseness are to enrich the context of data by external information. This paper, however, proposed an inter-disciplinary yet practical solution to this problem using a Multi-Topic Meta-Classification (MTMC) scheme built upon a set of semantic attributes (i.e., General Issue, Specific Issue, and Bill Info.), integrated with a domain-specific Policy Agenda (PA) coding/labeling procedure. First, multi-label base-classifiers that have been transformed into multi-class classification problems were learned from the abovementioned three semantic sources, respectively, second, to render reliability classification, one meta-classifier per attribute was trained based on meta-instances dataset labeled in a cross-validation fashion, third, the final prediction is made via fusing the reliable outputs of such ensembles of classifiers. Experiments demonstrated satisfactory classification performance with various evaluation measures on such a real-world textual dataset that poses many challenges including problems with noisy data and semantic ambiguity.
更多
查看译文
关键词
machine learning applications, multi-class & multi-label classification, meta-classifier, information fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要