ip2text: A Reasoning-Aware Dataset for Text Generation of Devices on the Internet

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2023 INTERNATIONAL WORKSHOPS, BDMS 2023, BDQM 2023, GDMA 2023, BUNDLERS 2023(2023)

引用 0|浏览3
暂无评分
摘要
Nowadays, Internet of Things (IoT) search engines are more and more popular for users to explore devices on the Internet. Table-to-text generation of devices is helpful for users to understand search results from IoT search engines. However, it has yet to be available, and difficult to obtain a good text description of the devices because of lacking quality data for this task. Also, the content is hidden in multiple attributes of the devices, and it takes work to mine them well and directly. Thus, this paper introduces ip2text, a challenging dataset for reasoning-aware table-to-text generation of devices on the Internet. The input data in ip2text are tables, which contain many attributes of devices collected from the Internet. And the output data is their corresponding descriptions. Generating descriptions of devices is costly, time-consuming, and does not scale to Internet data. To tackle this problem, this paper designs an annotation method based on active learning according to the characteristics of devices and studies the performance of existing and typical state-of-the-art models for table-to-text generation on ip2text. The automatic evaluation shows that existing pre-trained baselines could be challenging to perform satisfactorily on ip2text, with BLEU almost all less than 1. Further, the human evaluation shows that BART and T5 are prone to produce hallucinations when reasoning, and results show that Hallucination is more than 0.10. Therefore, it is not easy to achieve satisfactory performance using the existing and mainstream seq2seq models based on the reasoning-aware ip2text. So, continuous improvement is urgently needed for the models and datasets for the table-to-text generation of devices on the Internet.
更多
查看译文
关键词
Internet of Things,Table-to-text generation,Reasoning-Aware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要