Scene Understanding for Autonomous Driving Using Visual Question Answering.

IJCNN(2023)

引用 0|浏览4
暂无评分
摘要
This paper investigates the feasibility of dot-products present in self-attention mechanisms as an explainability technique for autonomous driving. A Visual Question Answering (VQA) framework is implemented with three types of questions pertaining to the presence or absence of road signs and traffic lights. The models are evaluated for the encoding of uni- and multimodal encodings: a standard version and a modified version of the Learning Cross-Modality Encoder Representations from Transformers (LXMERT) framework. We present numerical results for the two model architectures on the question answering task, with overall accuracies of 79.7% and 78.5% respectively, and overall F1-scores of 0.749 and 0.693 respectively. Moreover, we show that these questions, despite containing and asking no information on the objects' positions, indirectly tune the model such that the self-attention dot-products provide scenic understanding to the questions in the form of a visual map. The choice of pooling for the model's output and the plotting parameters for the visual maps influence the reliability and accuracy of visualization. Finally, an argument is made for the benefit of this approach to autonomous driving.
更多
查看译文
关键词
Scene Understanding,Autonomous Driving,Visual Question Answering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要