Validation of an improved vision-based web page parsing pipeline

ACM Transactions on the Web(2023)

引用 0|浏览0
暂无评分
摘要
In this paper, we present a novel approach to quantitative evaluation of a model for parsing web pages as visual images, intended to provide improvements for users with assistive needs (cognitive or visual deficits, enabling decluttering or zooming and supporting more effective screen reader output). This segmentation-classification pipeline is tested in stages: We first discuss the validation of the segmentation algorithm, showing that our approach produces automated segmentations that are very similar to those produced by real users when making use of a drawing interface to designate edges and regions. We also examine the properties of these ground truth segmentations produced under different conditions. We then describe our Hidden-Markov tree approach for classification and present results which serve provide important validation for this model. The analysis is set against effective choices for dataset and pruning options, measured with respect to manual ground truth labelling of regions. In all, we offer a detailed quantitative validation (focused on complex news pages) of a fully pipelined approach for interpreting web pages as visual images, an approach which enables important advances for users with assistive needs.
更多
查看译文
关键词
web page,validation,vision-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要