Transformer-based Models for Supervised Monocular Depth Estimation

2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP)(2022)

引用 1|浏览4
暂无评分
摘要
Existing traditional solutions for monocular depth estimation, usually use convolution networks as the backbone of their model architecture. This work presents an encoder-decoder network using a transformer architecture that can perform monocular depth estimation on a single RGB image. For environment perception and autonomous navigation systems, where depth estimation is done on edge devices, there is a need for lightweight and efficient models. It is shown that transformer-based architectures provide comparable results to the currently used convolution networks with significantly fewer parameters. Unlike convolutional networks, transformers don't downsample the input progressively at each layer. Maintaining a similar resolution throughout the encoding process allows for global awareness at each stage. 2 different decoder models are implemented on top of a transformer encoder and their usability is evaluated for depth estimation. On comparing with a comparable convolution network, it is observed that on the KITTI outdoor dataset, the lighter transformer model performs better in terms of robustness and accuracy.
更多
查看译文
关键词
transformer-based architectures,convolutional networks,decoder models,transformer encoder,comparable convolution network,lighter transformer model,transformer-based models,supervised monocular depth estimation,model architecture,encoder-decoder network,transformer architecture,single RGB image,autonomous navigation systems,lightweight models,efficient models,KITTI outdoor dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要