Contrastive Learning with Video Transformer for Driver Distraction Detection through Multiview and Multimodal Video

2023 IEEE Region 10 Symposium (TENSYMP)(2023)

引用 0|浏览3
暂无评分
摘要
Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.
更多
查看译文
关键词
driver distraction detection,multimodal video,multiview,video transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要