Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity
arxiv(2024)
摘要
In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for
multiplying sparse n × n matrices, using n computers. They assumed
that the input matrices are uniformly sparse – there are at most d non-zeros
in each row and column – and the task is to compute a uniformly sparse part of
the product matrix. Initially each computer knows one row of each input matrix,
and eventually each computer needs to know one row of the product matrix. In
each communication round each computer can send and receive one O(log n)-bit
message. Their algorithm solves this task in O(d^1.907) rounds, while the
trivial bound is O(d^2). We improve on the prior work in two dimensions:
First, we show that we can solve the same task faster, in only O(d^1.832)
rounds. Second, we explore what happens when matrices are not uniformly sparse.
We consider the following alternative notions of sparsity: row-sparse matrices
(at most d non-zeros per row), column-sparse matrices, matrices with bounded
degeneracy (we can recursively delete a row or column with at most d
non-zeros), average-sparse matrices (at most dn non-zeros in total), and
general matrices. We show that we can still compute X = AB in O(d^1.832)
rounds even if one of the three matrices (A, B, or X) is average-sparse
instead of uniformly sparse. We present algorithms that handle a much broader
range of sparsity in O(d^2 + log n) rounds, and present conditional hardness
results that put limits on further improvements and generalizations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要