Factorized Recurrent Neural Architectures For Longer Range Dependence

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84(2018)

引用 29|浏览83
暂无评分
摘要
The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is better able contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures, such as LSTMs and GRUs, and prove they do not provide LRD under assumptions sufficient for gradients to vanish. Motivated by an information-theoretic analysis, we provide a modified recurrent neural architecture that mitigates the issue of faulty memory through redundancy while keeping the compute time constant. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a recommender system show that we enable better memorization and longer-term memory.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要