AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
arxiv(2024)
摘要
Facial Action Units (AU) is a vital concept in the realm of affective
computing, and AU detection has always been a hot research topic. Existing
methods suffer from overfitting issues due to the utilization of a large number
of learnable parameters on scarce AU-annotated datasets or heavy reliance on
substantial additional relevant data. Parameter-Efficient Transfer Learning
(PETL) provides a promising paradigm to address these challenges, whereas its
existing methods lack design for AU characteristics. Therefore, we innovatively
investigate PETL paradigm to AU detection, introducing AUFormer and proposing a
novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual
MoKE specific to a certain AU with minimal learnable parameters first
integrates personalized multi-scale and correlation knowledge. Then the MoKE
collaborates with other MoKEs in the expert group to obtain aggregated
information and inject it into the frozen Vision Transformer (ViT) to achieve
parameter-efficient AU detection. Additionally, we design a Margin-truncated
Difficulty-aware Weighted Asymmetric Loss (MDWA-Loss), which can encourage the
model to focus more on activated AUs, differentiate the difficulty of
unactivated AUs, and discard potential mislabeled samples. Extensive
experiments from various perspectives, including within-domain, cross-domain,
data efficiency, and micro-expression domain, demonstrate AUFormer's
state-of-the-art performance and robust generalization abilities without
relying on additional relevant data. The code for AUFormer is available at
https://github.com/yuankaishen2001/AUFormer.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要