Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time
CoRR(2024)
摘要
Deep learning is closing the gap with humans on several object recognition
benchmarks. Here we investigate this gap in the context of challenging images
where objects are seen from unusual viewpoints. We find that humans excel at
recognizing objects in unusual poses, in contrast with state-of-the-art
pretrained networks (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) which are
systematically brittle in this condition. Remarkably, as we limit image
exposure time, human performance degrades to the level of deep networks,
suggesting that additional mental processes (requiring additional time) take
place when humans identify objects in unusual poses. Finally, our analysis of
error patterns of humans vs. networks reveals that even time-limited humans are
dissimilar to feed-forward deep networks. We conclude that more work is needed
to bring computer vision systems to the level of robustness of the human visual
system. Understanding the nature of the mental processes taking place during
extra viewing time may be key to attain such robustness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要