Semantic and Lexical Token Based Vectors Improve Precision of Recommendations for TV Programmes

Taner Cagali, Hadi Wazni,Saba Nazir,Mehrnoosh Sadrzadeh, Chris Newell

2023 IEEE International Symposium on Multimedia (ISM)(2023)

引用 0|浏览0
暂无评分
摘要
Advances in the digitalisation of data have led to large archives of content in media companies. These archives include multimodal data and metadata associated with each media programme. Relating content across different mediums of data and metadata has thus become an emergent challenge, with applications to popular domains such as programme recommendation. In this paper, we worked with combinations of content similarity measures computed from the distances between different forms of textual data obtained from subtitle files and metadata obtained from the genres of programmes. The different forms of textual representations we considered were neural semantic and topic vectors, and a weighted Jaccard distance encoding lexical token rareness. The late fusion combination of these four distances provided the best recommendation results. For a weekly dataset of 145 TV programmes, it increased the precision of the genre-based recommendations by 5.76%. In a monthly dataset of 906 programmes, it achieved an increase of 1.5%. This combination was more efficient than one with audio and video files.
更多
查看译文
关键词
Neural Embeddings,Semantic Vectors,Topic Models,Jaccard Distance,Rareness,Content,Genre,Cosine Similarity,Hybrid Recommender Models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要