First-Person Video Domain Adaptation With Multi-Scene Cross-Site Datasets and Attention-Based Methods

Xianyuan Liu,Shuo Zhou,Tao Lei,Ping Jiang,Zhixiang Chen,Haiping Lu

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY（2023）

引用 0|浏览4

暂无评分

摘要

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person video action recognition is an under-explored problem, with a lack of benchmark datasets and limited consideration of first-person video characteristics. Existing benchmark datasets provide videos with a single activity scene, e.g. kitchen, and similar global video statistics. However, multiple activity scenes and different global video statistics are still essential for developing robust UDA networks for real-world applications. To this end, we first introduce two first-person video domain adaptation datasets: ADL-7 and GTEA_KITCHEN-6. To the best of our knowledge, they are the first to provide multi-scene and cross-site settings for UDA problem on first-person video action recognition, promoting diversity. They provide five more domains based on the original three from existing datasets, enriching data for this area. They are also compatible with existing datasets, ensuring scalability. First-person videos have unique challenges, i.e. actions tend to occur in hand-object interaction areas. Therefore, networks paying more attention to such areas can benefit common feature learning in UDA. Attention mechanisms can endow networks with the ability to allocate resources adaptively for the important parts of the inputs and fade out the rest. Hence, we introduce channel-temporal attention modules to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to this characteristic. Moreover, we propose a Channel-Temporal Attention Network (CTAN) to integrate these modules into existing architectures. CTAN outperforms baselines on the new datasets and one existing dataset, EPIC-8.

查看译文

关键词

Action recognition,unsupervised domain adaptation,first-person vision,channel-temporal attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要