GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
CoRR(2024)
摘要
Beyond the text detection and recognition tasks in image text spotting, video
text spotting presents an augmented challenge with the inclusion of tracking.
While advanced end-to-end trainable methods have shown commendable performance,
the pursuit of multi-task optimization may pose the risk of producing
sub-optimal outcomes for individual tasks. In this paper, we highlight a main
bottleneck in the state-of-the-art video text spotter: the limited recognition
capability. In response to this issue, we propose to efficiently turn an
off-the-shelf query-based image text spotter into a specialist on video and
present a simple baseline termed GoMatching, which focuses the training efforts
on tracking while maintaining strong recognition performance. To adapt the
image text spotter to video datasets, we add a rescoring head to rescore each
detected instance's confidence via efficient tuning, leading to a better
tracking candidate pool. Additionally, we design a long-short term matching
module, termed LST-Matcher, to enhance the spotter's tracking capability by
integrating both long- and short-term matching results via Transformer. Based
on the above simple designs, GoMatching achieves impressive performance on two
public benchmarks, e.g., setting a new record on the ICDAR15-video dataset, and
one novel test set with arbitrary-shaped text, while saving considerable
training budgets. The code will be released at
https://github.com/Hxyz-123/GoMatching.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要