Comparative text analytics via topic modeling in banking

2017 IEEE Symposium Series on Computational Intelligence (SSCI)(2017)

引用 15|浏览32
暂无评分
摘要
In this paper, we compare and evaluate multiple topic modeling approaches and their effectiveness in analyzing a large set of SEC filings by US public banks. More specifically, we apply four major topic modeling methods to a corpus of 8-K and 10-K filings, from the years 2005-2016, of 578 bank holding companies. These methods include Principal Component Analysis, Non-negative Matrix Factorization, Latent Dirichlet Allocation and KATE, a novel k-competitive autoencoder for text documents. Separately for 8-K and 10-K, the usefulness and effectiveness of these methods is evaluated by comparing their performances on two classification tasks: (i) predicting which section each document corresponds to, where we consider each section within an 8-K or 10-K filing as an individual document, and (ii) detecting text from a bank's year of failure, a task for which we use bank failure data from the 2008 financial crisis. In addition, we qualitatively compare the topics discovered by the different methods. We conclude that topic modeling can be an effective tool in financial decision making and risk management.
更多
查看译文
关键词
comparative text analytics,banking,multiple topic modeling approaches,SEC filings,US public banks,10-K filing,Principal Component Analysis,Nonnegative Matrix Factorization,Latent Dirichlet Allocation,k-competitive autoencoder,text documents,individual document,bank failure data,bank holding companies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要