Learning Multi-Level Density Maps for Crowd Counting

IEEE Transactions on Neural Networks and Learning Systems(2020)

引用 44|浏览141
暂无评分
摘要
People in crowd scenes often exhibit the characteristic of imbalanced distribution. On the one hand, people size varies largely due to the camera perspective. People far away from the camera look smaller and are likely to occlude each other, whereas people near to the camera look larger and are relatively sparse. On the other hand, the number of people also varies greatly in the same or different scenes. This article aims to develop a novel model that can accurately estimate the crowd count from a given scene with imbalanced people distribution. To this end, we have proposed an effective multi-level convolutional neural network (MLCNN) architecture that first adaptively learns multi-level density maps and then fuses them to predict the final output. Density map of each level focuses on dealing with people of certain sizes. As a result, the fusion of multi-level density maps is able to tackle the large variation in people size. In addition, we introduce a new loss function named balanced loss (BL) to impose relatively BL feedback during training, which helps further improve the performance of the proposed network. Furthermore, we introduce a new data set including 1111 images with a total of 49 061 head annotations. MLCNN is easy to train with only one end-to-end training stage. Experimental results demonstrate that our MLCNN achieves state-of-the-art performance. In particular, our MLCNN reaches a mean absolute error (MAE) of 242.4 on the UCF_CC_50 data set, which is 37.2 lower than the second-best result.
更多
查看译文
关键词
Balanced loss (BL),convolutional neural network (CNN),crowd counting,multi-level density maps
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要