De-DSI: Decentralised Differentiable Search Index
Proceedings of the 4th Workshop on Machine Learning and Systems(2024)
摘要
This study introduces De-DSI, a novel framework that fuses large language
models (LLMs) with genuine decentralization for information retrieval,
particularly employing the differentiable search index (DSI) concept in a
decentralized setting. Focused on efficiently connecting novel user queries
with document identifiers without direct document access, De-DSI operates
solely on query-docid pairs. To enhance scalability, an ensemble of DSI models
is introduced, where the dataset is partitioned into smaller shards for
individual model training. This approach not only maintains accuracy by
reducing the number of data each model needs to handle but also facilitates
scalability by aggregating outcomes from multiple models. This aggregation uses
a beam search to identify top docids and applies a softmax function for score
normalization, selecting documents with the highest scores for retrieval. The
decentralized implementation demonstrates that retrieval success is comparable
to centralized methods, with the added benefit of the possibility of
distributing computational complexity across the network. This setup also
allows for the retrieval of multimedia items through magnet links, eliminating
the need for platforms or intermediaries.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要