Judging the Veracity of Claims and Reliability of Sources with Fact-Finders

user-5ebe28d54c775eda72abcdf7(2014)

引用 3|浏览3
暂无评分
摘要
The Information Age has made publishing, distributing and collecting information easier, resulting in the exponential growth of information available to us. Databases were once ledgers written by hand by a single person; today they can be vast stores of data agglomerated from a myriad of disparate sources. The mass media, formerly limited to newspapers and television programs held to strict journalistic standards, has expanded to include collaborative content such as blogs, wikis, and message boards. Documents covering nearly every topic abound on the Internet, but the authors are often anonymous and the accuracy uncertain. To cope with this new abundance, we employ information retrieval to suggest documents, and information extraction to tell us what they say, but how can we determine what we should actually believe? Not all information sources are equally trustworthy, and simply accepting the majority view often leads to errors: a Google search for “water runs downhill” returns 17.5K documents, while “water runs uphill” yields 116K. When we consider a collection of data with various authorship, we may view it as a set of information sources each making one or more claims. Sources often make claims that are contradictory (“Shakespeare was born on April 26th, 1564” and “Shakespeare was born on April 23rd, 1564”) and, even in the absence of contradiction, we have no guarantee that the sole presented claim is true. How, then, can we know which claims to believe, and which sources to trust? The typical approach is simple: take a vote and choose the claim made by the largest number of sources. However, this implicitly (and implausibly) assumes that all sources are equally trustworthy and, moreover, ignores the wealth of other claims being made by both these and other sources that could inform our belief in the particular claim at hand. For example, if we can ascertain that John’s other claims of birthdays for historic figures were correct, his claim about Shakespeare should (ceteris paribus) carry more weight. A diverse class of algorithms collectively known as fact-finders does just this, using the full network of sources and claims to jointly estimate both the trustworthiness of the sources and the believability of the claims. This is useful not just in judging the assertions made by authors in articles, but also in areas such as sensor networks and crowdsourcing. Crowdsourcing of information–where information is polled from a wider population–can be done via direct voting, the most famous being reCaptcha (Von Ahn, Maurer, McMillen, Abraham, & Blum, 2008), which uses humans to solve difficult OCR problems as a Turing test and accepts the text for a candidate word image once it has accrued enough votes. Similarly, the ESP Game (Von Ahn & Dabbish, 2004) obtains image labelings, presenting the task as a game. In both cases, the annotators are presented with examples
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要