Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
arxiv(2024)
摘要
We study the problem of unsupervised domain adaptation for egocentric videos.
We propose a transformer-based model to learn class-discriminative and
domain-invariant feature representations. It consists of two novel designs. The
first module is called Generative Adversarial Domain Alignment Network with the
aim of learning domain-invariant representations. It simultaneously learns a
mask generator and a domain-invariant encoder in an adversarial way. The
domain-invariant encoder is trained to minimize the distance between the source
and target domain. The masking generator, conversely, aims at producing
challenging masks by maximizing the domain distance. The second is a Masked
Consistency Learning module to learn class-discriminative representations. It
enforces the prediction consistency between the masked target videos and their
full forms. To better evaluate the effectiveness of domain adaptation methods,
we construct a more challenging benchmark for egocentric videos, U-Ego4D. Our
method achieves state-of-the-art performance on the Epic-Kitchen and the
proposed U-Ego4D benchmark.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要