Fusing Lidar And Images For Pedestrian Detection Using Convolutional Neural Networks

2016 IEEE International Conference on Robotics and Automation (ICRA)(2016)

引用 163|浏览68
暂无评分
摘要
In this paper, we explore various aspects of fusing LIDAR and color imagery for pedestrian detection in the context of convolutional neural networks (CNNs), which have recently become state-of-art for many vision problems. We incorporate LIDAR by up-sampling the point cloud to a dense depth map and then extracting three features representing different aspects of the 3D scene. We then use those features as extra image channels. Specifically, we leverage recent work on HHA [9] (horizontal disparity, height above ground, and angle) representations, adapting the code to work on up-sampled LIDAR rather than Microsoft Kinect depth maps. We show, for the first time, that such a representation is applicable to up-sampled LIDAR data, despite its sparsity. Since CNNs learn a deep hierarchy of feature representations, we then explore the question: At what level of representation should we fuse this additional information with the original RGB image channels? We use the KITTI pedestrian detection dataset for our exploration. We first replicate the finding that region-CNNs (R-CNNs) [8] can outperform the original proposal mechanism using only RGB images, but only if fine-tuning is employed. Then, we show that: 1) using HHA features and RGB images performs better than RGB-only, even without any fine-tuning using large RGB web data, 2) fusing RGB and HHA achieves the strongest results if done late, but, under a parameter or computational budget, is best done at the early to middle layers of the hierarchical representation, which tend to represent mid-level features rather than low (e.g. edges) or high (e.g. object class decision) level features, 3) some of the less successful methods have the most parameters, indicating that increased classification accuracy is not simply a function of increased capacity in the neural network.
更多
查看译文
关键词
LIDAR image fusion,pedestrian detection,convolutional neural networks,CNN,depth map,feature extraction,3D scene,horizontal disparity,height above ground,Microsoft Kinect depth maps,up-sampled LIDAR data,feature representations,RGB image channels,KITTI pedestrian detection dataset,region-CNN,R-CNN,HHA features,RGB Web data,mid-level feature representation,object class decision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要