Visual Transformer论文集收录了google Transformer在计算机视觉上的理论和应用论文
Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer
In addition to Multi-Object Tracking Accuracy and Identity F1 Score, we report the following additional CLEAR multi-object tracking metrics: MT: Ground truth tracks covered for at least 80%
Cited by0BibtexViews8Links
0
0
Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu
We presented LinE segment TRansformers, a line segment detector based on multi-scale encoder/decoder transformer structure
Cited by0BibtexViews9Links
0
0
We provide an analysis on open research directions and possible future works
Cited by0BibtexViews19Links
0
0
We propose VisualSparta, a simple yet effective text-to-image retrieval model that performs better than all existing retrieval models on both accuracy and retrieval latency
Cited by0BibtexViews0Links
0
0
ICLR 2021, (2021)
Further scaling of Vision Transformer would likely lead to improved performance
Cited by0BibtexViews27Links
0
0
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
ICLR 2021, (2021)
Despite its interesting design and good performance, DETR has its own issues: It requires much longer training epochs to converge than the existing object detectors
Cited by0BibtexViews277Links
0
0
Our Feature Pyramid Transformer does not change the size of the feature pyramid, and is generic and easy to plug-and-play with modern deep networks
Cited by1BibtexViews236Links
0
0
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao,Chunjing Xu,Yixing Xu,Zhaohui Yang, Yiman Zhang
Transformer is becoming a hot topic in computer vision area due to its competitive performance and tremendous potential compared to convolutional neural networks
Cited by0BibtexViews495Links
0
0
CVPR, pp.5790-5799, (2020)
The proposed texture transformer consists of a learnable texture extractor which learns a jointly feature embedding for further attention computation and two attention based modules which transfer HR textures from the Ref image
Cited by0BibtexViews44Links
0
0
Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li
Since our framework treats depth estimation as an auxiliary for visual odometry without special optimization, the improvement indicates that accurate camera pose estimation improves depth estimation in the proposed framework
Cited by0BibtexViews0Links
0
0
Transformer in Image Quality takes the advantage of inductive capability of convolution neural networks architecture for quality feature derivation and Transformer encoder for aggregated representation of attention mechanism
Cited by0BibtexViews2Links
0
0
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao,Xiatian Zhu, Zekun Luo, Yabiao Wang,Yanwei Fu, Jianfeng Feng, Tao Xiang,Philip H. S. Torr,Li Zhang
We can see that our model SEgmentation TRansformer-PUP is superior to fully convolutional network baselines, and FCN plus attention based approaches, such as Non-local and CCNet; and its performance is on par with the best results reported so far
Cited by0BibtexViews2Links
0
0
Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong,Zehuan Yuan,Changhu Wang,Ping Luo
The learned object query detects objects in the current frame and object feature query from the previous frame associates objects in the current frame with the previous ones
Cited by0BibtexViews3Links
0
0
Sen Yang, Zhibin Quan, Mu Nie, Wankou Yang
TransPose models match the state-of-the-art on COCO Keypoint Detection task that has been dominated by deep fully convolutional architectures, and there seems to have further space to improve the upper limit of model performance by expanding the size of TransPose
Cited by0BibtexViews1Links
0
0
Hugo Touvron,Matthieu Cord,Matthijs Douze, Francisco Massa, Alexandre Sablayrolles,Hervé Jégou
For Data-efficient image Transformers we have only optimized the existing data augmentation and regularization strategies pre-existing for convnets, not introducing any significant architectural beyond our novel distillation token
Cited by0BibtexViews3Links
0
0
Xuran Pan, Zhuofan Xia,Shiji Song, Li Erran Li,Gao Huang
This paper introduces Pointformer, a highly effective feature learning backbone for 3D point clouds that is permutation invariant to points in the input and learns local and global context-aware representations
Cited by0BibtexViews16Links
0
0
Josh Beal, Eric Kim,Eric Tzeng, Dong Huk Park,Andrew Zhai, Dmitry Kislyuk
We introduced Vision Transformer-Faster R-CNN, a competitive object detection solution which utilizes a transformer backbone, suggesting that there are sufficiently different architectures from the well-studied CNN backbone plausible to make progress on complex vision tasks
Cited by0BibtexViews4Links
0
0
Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expressive, but also computa...
Cited by0BibtexViews0Links
0
0
Hila Chefer,Shir Gur,Lior Wolf
Self-attention techniques, and specifically Transformers, are dominating the field of text processing and are becoming increasingly popular in computer vision classification tasks. In order to visualize the parts of the image that led to a certain classification, existing metho...
Cited by0BibtexViews2Links
0
0
Xinpeng Wang, Chandan Yeshwanth, Matthias Nießner
Our model can serve as a general framework for scene generation: a different task can be solved by changing the set of object properties or conditioning inputs
Cited by0BibtexViews0Links
0
0
Keywords
Authors
Yunhe Wang
Paper 2
Fahad Shahbaz Khan
Paper 1
Alvin Wan
Paper 1
Neil Houlsby
Paper 1
Hanwang Zhang
Paper 1
Patrick Esser
Paper 1
Matthijs Douze
Paper 1
Zhidong Deng
Paper 1