The Role of Aggregation Functions on Transformers and ViTs Self-Attention for Classification

2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)(2023)

引用 0|浏览3
暂无评分
摘要
Aggregation functions are mathematical operations that combine or summarize a set of values into a single representative value. They play a crucial role in the attention mechanisms of Transformer neural networks. However, Transformers' default aggregation functions, based on matrix multiplication, may have limitations in certain classification scenarios. This function may struggle with the complexity of information present in the input data, resulting in lower accuracy and efficiency. Considering this issue, the present work aims to replace the traditional matrix multiplication operation used in the classical attention mechanism with alternative and more general aggregation functions. To validate the new aggregation methods on the attention mechanism, we conducted experiments on two datasets, the recently propose Google American Sign Language (ASL) Fingerspelling Recognition and the well-known CIFAR-10, performing time series and image classification, respectively. Results shed light on the role of aggregation functions for classification with Transformers, demonstrating promising outcomes and potential for further improvements.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要