The HAIRCUT System at TREC-9

TREC(2000)

引用 29|浏览11
暂无评分
摘要
he token stream. Thetext was lowercased, punctuation was removed, anddiacritical marks were retained. Tokens containingdigits were preserved; however only the first two of asequence of digits were retained (e.g., 1920 became19##). The result is a stream of blank-separatedwords.When using n-grams we construct indexing termsfrom the same sequence of words. These n-gramsmay span word boundaries; an attempt is made todiscover sentence boundaries so that n-gramsspanning sentence...
更多
查看译文
关键词
indexation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要