Cross-Scale Cost Aggregation for Stereo Matching

    IEEE Trans. Circuits Syst. Video Techn., Volume 27, Issue 5, 2017, Pages 965-976.

    Cited by: 143|Bibtex|Views49|Links
    EI WOS
    Keywords:
    kernelvisualizationcomputer visionoptimizationstereo visionMore(1+)
    Wei bo:
    We have proposed a cross-scale cost aggregation framework for stereo matching

    Abstract:

    This paper proposes a generic framework that enables a multiscale interaction in the cost aggregation step of stereo matching algorithms. Inspired by the formulation of image filters, we first reformulate cost aggregation from a weighted least-squares (WLS) optimization perspective and show that different cost aggregation methods essentia...More

    Code:

    Data:

    0
    Introduction
    • Dense correspondence between two images is a key problem in computer vision [12].
    • Adding a constraint that the two images are a stereo pair of the same scene, the dense correspondence problem degenerates into the stereo matching problem [23].
    • A stereo matching algorithm generally takes four steps: cost computation, cost aggregation, disparity computation and disparity refinement [23].
    • A 3D cost volume is generated by computing matching costs for each pixel at all possible disparity levels.
    • Disparity for each pixel is computed with local or global optimization methods and refined by vari-
    Highlights
    • Dense correspondence between two images is a key problem in computer vision [12]
    • Adding a constraint that the two images are a stereo pair of the same scene, the dense correspondence problem degenerates into the stereo matching problem [23]
    • Different from previous CTF methods, our method models the evolution of the cost volume in
    • We have proposed a cross-scale cost aggregation framework for stereo matching
    • We investigate the scale space behavior of various cost aggregation methods
    • Extensive experiments on three datasets validated the effect of cross-scale cost aggregation
    Methods
    • BOX S+BOX NL[33] S+NL ST[16] S+ST BF[36] S+BF GF[21] S+GF Avg Err(%)

      nique from [33] (For ST [16], the authors list its original rank reported in the Middlebury benchmark [24], since the same results was not reproduced using the author’s C++ code).
    • Like the Middlebury dataset, the simple BOX method becomes very powerful by using cross-scale cost aggregation.
    • For S+NL and S+ST, their performances are almost the same as those without cross-scale cost aggregation, which are even worse than that of S+BOX.
    • This may be due to the non-local property of tree-based cost aggregation methods.
    • Even though the cross-scale cost aggregation is adopted, errors in textureless slant planes are not fully addressed.
    • Disparity maps for all methods are presented in the supplementary material, which validate the analysis
    Results
    • All local cost aggregation methods perform rather bad (error rate of non-occlusion area is more than 20%) in 4 stereo pairs from Middlebury 2006 dataset, i.e. Midd1, Midd2, Monopoly and Plastic.
    Conclusion
    • Conclusions and Future Work

      In this paper, the authors have proposed a cross-scale cost aggregation framework for stereo matching.
    • A new trend in stereo vision is to solve the correspondence problem in continuous plane parameter space rather than in discrete disparity label space [1, 13, 32].
    • These methods can handle slant planes very well and one probable future direction is to investigate the scale space behavior of these methods
    Summary
    • Introduction:

      Dense correspondence between two images is a key problem in computer vision [12].
    • Adding a constraint that the two images are a stereo pair of the same scene, the dense correspondence problem degenerates into the stereo matching problem [23].
    • A stereo matching algorithm generally takes four steps: cost computation, cost aggregation, disparity computation and disparity refinement [23].
    • A 3D cost volume is generated by computing matching costs for each pixel at all possible disparity levels.
    • Disparity for each pixel is computed with local or global optimization methods and refined by vari-
    • Methods:

      BOX S+BOX NL[33] S+NL ST[16] S+ST BF[36] S+BF GF[21] S+GF Avg Err(%)

      nique from [33] (For ST [16], the authors list its original rank reported in the Middlebury benchmark [24], since the same results was not reproduced using the author’s C++ code).
    • Like the Middlebury dataset, the simple BOX method becomes very powerful by using cross-scale cost aggregation.
    • For S+NL and S+ST, their performances are almost the same as those without cross-scale cost aggregation, which are even worse than that of S+BOX.
    • This may be due to the non-local property of tree-based cost aggregation methods.
    • Even though the cross-scale cost aggregation is adopted, errors in textureless slant planes are not fully addressed.
    • Disparity maps for all methods are presented in the supplementary material, which validate the analysis
    • Results:

      All local cost aggregation methods perform rather bad (error rate of non-occlusion area is more than 20%) in 4 stereo pairs from Middlebury 2006 dataset, i.e. Midd1, Midd2, Monopoly and Plastic.
    • Conclusion:

      Conclusions and Future Work

      In this paper, the authors have proposed a cross-scale cost aggregation framework for stereo matching.
    • A new trend in stereo vision is to solve the correspondence problem in continuous plane parameter space rather than in discrete disparity label space [1, 13, 32].
    • These methods can handle slant planes very well and one probable future direction is to investigate the scale space behavior of these methods
    Tables
    • Table1: Quantitative evaluation of cost aggregation methods on the Middlebury dataset. The prefix ‘S+’ denotes our cross-scale cost aggregation framework. For the rank part (column 4 and 5), the disparity results were refined with the same disparity refinement technique [<a class="ref-link" id="c33" href="#r33">33</a>]
    • Table2: Quantitative comparison of cost aggregation methods on KITTI dataset. Out-Noc: percentage of erroneous pixels in nonoccluded areas; Out-All: percentage of erroneous pixels in total; Avg-Noc: average disparity error in non-occluded areas; Avg-All: average disparity error in total
    • Table3: Quantitative comparison of cost aggregation methods on New Tsukuba dataset
    Download tables as Excel
    Related work
    • Recent surveys [9, 29] give sufficient comparison and analysis for various cost aggregation methods. We refer readers to these surveys to get an overview of different cost aggregation methods and we will focus on stereo matching methods involving multi-scale information, which are very relevant to our idea but have substantial differences.

      Early researchers of stereo vision adopted the coarse-tofine (CTF) strategy for stereo matching [15]. Disparity of a coarse resolution was assigned firstly, and coarser disparity was used to reduce the search space for calculating finer disparity. This CTF (hierarchical) strategy has been widely used in global stereo methods such as dynamic programming [30], semi-global matching [25], and belief propagation [3, 34] for the purpose of accelerating convergence and avoiding unexpected local minima. Not only global methods but also local methods adopt the CTF strategy. Unlike global stereo methods, the main purpose of adopting the CTF strategy in local stereo methods is to reduce the search space [35, 11, 10] or take the advantage of multiscale related image representations [26, 27]. While, there is one exception in local CTF approaches. Min and Sohn [19] modeled the cost aggregation by anisotropic diffusion and solved the proposed variational model efficiently by the multi-scale approach. The motivation of their model is to denoise the cost volume which is very similar with us, but our method enforces the inter-scale consistency of cost volumes by regularization.
    Funding
    • Yang are supported by the National Basic Research Program of China (973) under Grant No 2011CB302206, the NSFC under Grant No.61272231, 61210008, and Beijing Key Laboratory of Networked Multimedia
    • Min is supported by the research grant for the Human Sixth Sense Programme at the Advanced Digital Sciences Center from Singapores Agency for Science, Technology and Research (A*STAR)
    • Yan are supported by the Singapore National Research Foundation under its International Research Centre @Singapore Funding Initiative and administered by the IDM Programme Office
    • Tian is supported by ARO grant W911NF-12-10057, Faculty Research Awards by NEC Laboratories of America, 2012 UTSA START-R Research Award and NSFC 61128007 respectively
    Reference
    • M. Bleyer, C. Rhemann, and C. Rother. PatchMatch stereo - stereo matching with slanted support windows. In BMVC, 2018
      Google ScholarLocate open access versionFindings
    • P. J. Burt. Fast filter transform for image processing. CGIP, 1981. 4
      Google ScholarLocate open access versionFindings
    • P. Felzenszwalb and D. Huttenlocher. Efficient belief propagation for early vision. In CVPR, 2004. 2
      Google ScholarLocate open access versionFindings
    • A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR, 2012. 2, 5, 6, 7
      Google ScholarLocate open access versionFindings
    • A. Geiger, P. Lenz, and R. Urtasun. The KITTI Vision Benchmark Suite. http://www.cvlibs.net/datasets/kitti/eval stereo flow.php?benchmark=stereo, 2012.7
      Findings
    • D. Hafner, O. Demetz, and J. Weickert. Why is the census transform good for robust optic flow computation? In SSVM, 2013. 7
      Google ScholarLocate open access versionFindings
    • K. He, J. Sun, and X. Tang. Guided image filtering. In ECCV, 2010. 2, 3
      Google ScholarLocate open access versionFindings
    • H. Hirschmuller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, 2007. 5
      Google ScholarLocate open access versionFindings
    • A. Hosni, M. Bleyer, and M. Gelautz. Secrets of adaptive support weight techniques for local stereo matching. CVIU, 2013. 2, 5
      Google ScholarLocate open access versionFindings
    • W. Hu, K. Zhang, L. Sun, and S. Yang. Comparisons reducing for local stereo matching using hierarchical structure. In ICME, 2013. 2, 3
      Google ScholarFindings
    • Y.-H. Jen, E. Dunn, P. Fite-Georgel, and J.-M. Frahm. Adaptive scale selection for hierarchical stereo. In BMVC, 202, 3
      Google ScholarLocate open access versionFindings
    • C. Liu, J. Yuen, and A. Torralba. SIFT flow: dense correspondence across scenes and its applications. TPAMI, 2011. 1
      Google ScholarLocate open access versionFindings
    • J. Lu, H. Yang, D. Min, and M. N. Do. Patch match filter: efficient edge-aware filtering meets randomized search for fast correspondence field estimation. In CVPR, 208
      Google ScholarLocate open access versionFindings
    • H. A. Mallot, S. Gillner, and P. A. Arndt. Is correspondence search in human stereo vision a coarse-to-fine process? Biological Cybernetics, 1996. 2
      Google ScholarLocate open access versionFindings
    • D. Marr and T. Poggio. A computational theory of human stereo vision. Proceedings of the Royal Society of London. Series B. Biological Sciences, 1979. 2
      Google ScholarLocate open access versionFindings
    • X. Mei, X. Sun, W. Dong, H. Wang, and X. Zhang. Segment-tree based cost aggregation for stereo matching. In CVPR, 2013. 1, 2, 3, 5, 6, 7
      Google ScholarLocate open access versionFindings
    • M. D. Menz and R. D. Freeman. Stereoscopic depth processing in the visual cortex: a coarse-to-fine mechanism. Nature neuroscience, 2003. 2, 4
      Google ScholarLocate open access versionFindings
    • P. Milanfar. A tour of modern image filtering: new insights and methods, both practical and theoretical. IEEE Signal Processing Magazine, 2013. 2, 3
      Google ScholarLocate open access versionFindings
    • D. Min and K. Sohn. Cost aggregation and occlusion handling with WLS in stereo matching. TIP, 2008. 2
      Google ScholarLocate open access versionFindings
    • M. Peris, A. Maki, S. Martull, Y. Ohkawa, and K. Fukui. Towards a simulation driven stereo vision system. In ICPR, 2012. 2, 5, 7
      Google ScholarLocate open access versionFindings
    • C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz. Fast cost-volume filtering for visual correspondence and beyond. In CVPR, 2011. 1, 2, 3, 5, 6, 7
      Google ScholarLocate open access versionFindings
    • D. Scharstein and C. Pal. Learning conditional random fields for stereo. In CVPR, 2007. 5
      Google ScholarLocate open access versionFindings
    • D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 2002. 1, 2, 5
      Google ScholarLocate open access versionFindings
    • D. Scharstein and R. Szeliski. Middlebury Stereo Vision Website. http://vision.middlebury.edu/stereo/, 2002.3, 5, 6
      Findings
    • H. Simon and K. Reinhard. Evaluation of a new coarse-to-fine strategy for fast semi-global stereo matching. Advances in Image and Video Technology, 2012. 2
      Google ScholarLocate open access versionFindings
    • M. Sizintsev. Hierarchical stereo with thin structures and transparency. In CCCRV, 2008. 2
      Google ScholarLocate open access versionFindings
    • L. Tang, M. K. Garvin, K. Lee, W. L. M. Alward, Y. H. Kwon, and M. D. Abramoff. Robust multiscale stereo matching from fundus images with radiometric differences. TPAMI, 2011. 2
      Google ScholarLocate open access versionFindings
    • C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In ICCV, 1998. 1, 3
      Google ScholarLocate open access versionFindings
    • F. Tombari, S. Mattoccia, L. Di Stefano, and E. Addimanda. Classification and evaluation of cost aggregation methods for stereo correspondence. In CVPR, 2008. 2
      Google ScholarLocate open access versionFindings
    • G. Van Meerbergen, M. Vergauwen, M. Pollefeys, and L. Van Gool. A hierarchical symmetric stereo algorithm using dynamic programming. IJCV, 2002. 2
      Google ScholarLocate open access versionFindings
    • Z.-F. Wang and Z.-G. Zheng. A region based stereo matching algorithm using cooperative optimization. In CVPR, 2008. 1
      Google ScholarFindings
    • K. Yamaguchi, D. McAllester, and R. Urtasun. Robust monocular epipolar flow estimation. In CVPR, 2013. 8
      Google ScholarLocate open access versionFindings
    • Q. Yang. A non-local cost aggregation method for stereo matching. In CVPR, 2012. 1, 2, 3, 5, 6, 7
      Google ScholarLocate open access versionFindings
    • Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. TPAMI, 2009. 1, 2
      Google ScholarLocate open access versionFindings
    • R. Yang and M. Pollefeys. Multi-resolution real-time stereo on commodity graphics hardware. In CVPR, 2003. 2
      Google ScholarFindings
    • K.-J. Yoon and I. S. Kweon. Adaptive support-weight approach for correspondence search. TPAMI, 2006. 1, 2, 3, 5, 6, 7
      Google ScholarLocate open access versionFindings
    • R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In ECCV, 1994. 7
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments