Reconstruction and Scalable Detection and Tracking of 3D Objects

semanticscholar(2020)

引用 0|浏览2
暂无评分
摘要
The task of detecting objects in images is essential for autonomous systems to categorize, comprehend and eventually navigate or manipulate its environment. Since many applications demand not only detection of objects but also the estimation of their exact poses, 3D CAD models can prove helpful since they provide means for feature extraction and hypothesis refinement. This work, therefore, explores two paths: firstly, we will look into methods to create richlytextured and geometrically accurate models of real-life objects. Using these reconstructions as a basis, we will investigate on how to improve in the domain of 3D object detection and pose estimation, focusing especially on scalability, i.e. the problem of dealing with multiple objects simultaneously. A fundamental aspect of the thesis is the usage of RGB-D sensors to tackle the above-mentioned tasks. In contrast to standard color cameras, these sensors provide an additional depth channel that supplies to each pixel the metric distance to the camera. This allows for correct depth perception and alleviates many typical problems such as scale estimation and occlusion reasoning. The part on reconstruction will start with a method that allows for recovering the full colored geometry of arbitrarily shaped objects. By tracking the camera movement via the object’s support surface, we can eventually fuse keyframes into a colored signed distance field after global pose optimization. By repositioning the object to expose unseen parts, we create multiple partial scans and propose a novel variational fusion scheme. The reconstruction quality exhibited supersedes related methods and allows for metrically accurate models. In a follow-up work, we focus on a more efficient method to achieve the fusion by means of Octrees. These spatial look-up structures allow for memory-efficient storage but are not straightforward to use in optimization tasks. The part on detection will first focus on hashing of templated object views to achieve scalability. While discriminative, most template approaches suffer from a linear time complexity since each template has to be matched against the scene. With our learned hashing scheme, we decrease the computational complexity with only a small penalty on the detection accuracy. From there, we present our second work in that domain that employs Deep Learning of local RGB-D patches to allow for robust voting of object instances. This method is scalable and performs well in cluttered scenes at reasonable speeds. Following up, we introduce a novel deeply-learned detection scheme that predicts 2D bounding boxes together with scored 6D poses in a single shot. Our approach scales well to many objects and can run at 10Hz. Lastly, we present a model tracker in RGB-D data by means of a direct energy minimization over contour and object-interior cues. Our elegant method is robust to occlusion and scale changes and runs on a single CPU core for multiple objects in real-time.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要