The BBC World Service Archive prototype.

Journal of Web Semantics(2014)

引用 33|浏览22
暂无评分
摘要
Most broadcasters have accumulated large audio and video archives stretching back over many decades. For example the BBC World Service radio archive includes around 70,000 English-language programmes from over 45 years. This amounts to about three years of continuous audio and around 15 TB of data. The metadata around this archive is sparse and sometimes wrong, but the full audio content is available in digital form. We have built a system to process the existing audio and text and automatically annotate programmes within the archive with Linked Data web identifiers. The resulting interlinks are used to bootstrap search and navigation within this archive and expose it to users. Automated data will never be entirely accurate so we built crowdsourcing mechanisms for users to correct and add data. The resulting crowdsourced data is then used to improve search and navigation within the archive, as well as evaluate and improve our algorithms. As a result of this feedback cycle, the interlinks between our archive and the Semantic Web are continuously improving. This unique combination of Semantic Web technologies, automation and crowdsourcing has dramatically reduced the amount of time and effort required to publish this rich archive online. The BBC World Service archive prototype is available online at http://worldservice.prototyping.bbc.co.uk, last accessed March 2014.
更多
查看译文
关键词
Crowdsourcing,Semantic Web,Automated tagging,Speaker identification,Interlinking,Archives
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要