FAST: Factorizable Attention for Speeding up Transformers
CoRR(2024)
摘要
Motivated by the factorization inherent in the original fast multipole method
and the improved fast Gauss transform we introduce a factorable form of
attention that operates efficiently in high dimensions. This approach reduces
the computational and memory complexity of the attention mechanism in
transformers from O(N^2) to O(N). In comparison to previous attempts, our
work presents a linearly scaled attention mechanism that maintains the full
representation of the attention matrix without compromising on sparsification
and incorporates the all-to-all relationship between tokens. We explore the
properties of our new attention metric and conduct tests in various standard
settings. Results indicate that our attention mechanism has a robust
performance and holds significant promise for diverse applications where
self-attention is used.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要