Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks
arXiv: Artificial Intelligence, Volume abs/1711.02326, 2018.
A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagatio...More