NLM at TREC 2020 Health Misinformation and Deep Learning Tracks.

TREC(2020)

引用 0|浏览5
暂无评分
摘要
This paper describes the participation of the National Library of Medicine to TREC 2020. Our main focus was the health misinformation track. We also participated to the Deep Learning track to both evaluate and enhance our deep re-ranking baselines for information retrieval. Our methods include a wide variety of approaches, ranging from conventional Information Retrieval (IR) models, neural re-ranking models, Natural Language Inference (NLI) models, Claim-Truth models, hyperlinks-based scores such as Page Rank and HITS, and ensemble methods. 1 Health Misinformation Track With the fast pace of online content publication, misinformation about COVID19 and the new coronavirus proved difficult to track and debunk at scale. The health misinformation track at TREC 2020 tackles this issue through an international challenge on the automatic recognition of misinformation from the web using a crawl of new articles published between January and April 2020 as a reference dataset. The challenge relies on a set of 46 questions about COVID-19 and their reference yes/no answer. Two tasks are considered. The first Total Recall task focuses on misinformation and requires participating systems to rank documents promulgating misinformation first. The second Ad-hoc task tackles the retrieval of relevant, correct, and credible information first. For our participation, we first parsed the target Common Crawl News collection and used a combination of the Optimaize language detector and an ASCII character ratio threshold to keep only documents written in English. We indexed the filtered documents at two different levels of granularity: (1) document-level indexing and (2) sentence level indexing. We applied different conventional information retrieval models to retrieve either the top 10000 or top 1000 documents, as well as relevance-based T5 and BERT re-ranking models, and rank-based ensembles with the different approaches. Figure 1 presents an overview of our data pipeline, approaches and workflow. 1 Common Crawl News: https://github.com/commoncrawl/news-crawl 2 https://github.com/optimaize/language-detector 2 Authors Suppressed Due to Excessive Length Fig. 1. Methods Overview 1.1 Retrieval Performance We analyzed retrieval performance as it is critical for both the ad-hoc and total recall tasks. Table 1 presents a summary of our both our backend retrieval approaches and some of our first runs. All submitted runs are described in more details in section 1.2 We used the derived qrels for the useful (relevant) aspect to evaluate each of our approaches. We computed the values for ndcg @1000, ndcg @10, reciprocal rank (rr), and recall @1000 (cf. table 2. The sentence-level indexing approaches (BNU, and TME) under-performed substantially document-level indexing approaches. Which is likely due in part to the very low overlap between the lists of documents retrieved by the documentbased and sentence-based methods (cf. figure 2) and the high correlation between the ratio of retrieved documents annotated by NIST assessors and the NDCG values (cf. table 2. To investigate this hypothesis, we performed a manual evaluation of one of the sentence-based approaches to analyze further the error cases. We pooled the top 20 documents for each query from the BM25 (BNU) T5 sentence-based method on all 46 topics and used the specific set of sentences returned by the method for each document as our textual evidence to assess its relevance for the topic. Table 3 shows the number of annotated documents, the number of documents in common between our annotations and the official useful qrels from NIST, and the agreement between our annotations and the official qrels on the common documents. We present a summary of all annotation disgreement cases in table 4.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要