Token-Based Spatiotemporal Representation of the Events

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览3
暂无评分
摘要
The event camera’s low power consumption and ability to capture microsecond brightness changes make it attractive for various computer vision tasks. Existing event representation methods typically convert events into frames, voxel grids, or spikes for deep neural networks (DNNs). However, these approaches often sacrifice temporal granularity or require specialized devices for processing. This work introduces a novel token-based event representation, where each event is considered a fundamental processing unit termed an event-token. This approach preserves the sequence’s intricate spatiotemporal attributes at the event level. Moreover, we propose a Three-way Attention mechanism in the Event Transformer Block (ETB) to collaboratively construct temporal and spatial correlations between events. We compare our proposed token-based event representation extensively with other prevalent methods for object classification and optical flow estimation. The experimental results showcase its competitive performance while demanding minimal computational resources on standard devices.
更多
查看译文
关键词
Token-based Event Representation,Spatiotemporal Characteristics,Three-way Attention,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要