FaceXFormer: A Unified Transformer for Facial Analysis
arxiv(2024)
摘要
In this work, we introduce FaceXformer, an end-to-end unified transformer
model for a comprehensive range of facial analysis tasks such as face parsing,
landmark detection, head pose estimation, attributes recognition, and
estimation of age, gender, race, and landmarks visibility. Conventional methods
in face analysis have often relied on task-specific designs and preprocessing
techniques, which limit their approach to a unified architecture. Unlike these
conventional methods, our FaceXformer leverages a transformer-based
encoder-decoder architecture where each task is treated as a learnable token,
enabling the integration of multiple tasks within a single framework. Moreover,
we propose a parameter-efficient decoder, FaceX, which jointly processes face
and task tokens, thereby learning generalized and robust face representations
across different tasks. To the best of our knowledge, this is the first work to
propose a single model capable of handling all these facial analysis tasks
using transformers. We conducted a comprehensive analysis of effective
backbones for unified face task processing and evaluated different task queries
and the synergy between them. We conduct experiments against state-of-the-art
specialized models and previous multi-task models in both intra-dataset and
cross-dataset evaluations across multiple benchmarks. Additionally, our model
effectively handles images "in-the-wild," demonstrating its robustness and
generalizability across eight different tasks, all while maintaining the
real-time performance of 37 FPS.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要