PointTransformer: Encoding Human Local Features for Small Target Detection

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE(2022)

引用 0|浏览7
暂无评分
摘要
The improvement of small target detection and obscuration handling is the key problem to be solved in the object detection task. In the field operation of chemical plant, due to the occlusion of construction workers and the long distance of surveillance shooting, it often leads to the phenomenon of missed detection. Most of the existing work uses multiple feature fusion strategies to extract different levels of features and then aggregate them into global features, which does not utilize local features and makes it difficult to improve the performance of small target detection. To address this issue, this paper introduces Point Transformer, a transformer encoder, as the core backbone of the object detection framework that first uses a priori information of human skeletal points to obtain local features and then uses both self-attention and cross-attention mechanisms to reconstruct the local features corresponding to each key point. In addition, since the target to be detected is highly correlated with the position of human skeletal points, to further boost Point Transformer's performance, a learnable positional encoding method is proposed by us to highlight the position characteristics of each skeletal point. The proposed model is evaluated on the dataset of field operation in a chemical plant. The results are significantly better than the classical algorithms. It also outperforms state-of-the-art by 12 percent of map points in the small target detection task.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要