Fast Inference Through The Reuse Of Attention Maps In Diffusion Models
CoRR(2023)
摘要
Text-to-image diffusion models have demonstrated unprecedented abilities at
flexible and realistic image synthesis. However, the iterative process required
to produce a single image is costly and incurs a high latency, prompting
researchers to further investigate its efficiency. Typically, improvements in
latency have been achieved in two ways: (1) training smaller models through
knowledge distillation (KD); and (2) adopting techniques from ODE-theory to
facilitate larger step sizes. In contrast, we propose a training-free approach
that does not alter the step-size of the sampler. Specifically, we find the
repeated calculation of attention maps to be both costly and redundant;
therefore, we propose a structured reuse of attention maps during sampling. Our
initial reuse policy is motivated by rudimentary ODE-theory, which suggests
that reuse is most suitable late in the sampling procedure. After noting a
number of limitations in this theoretical approach, we empirically search for a
better policy. Unlike methods that rely on KD, our reuse policies can easily be
adapted to a variety of setups in a plug-and-play manner. Furthermore, when
applied to Stable Diffusion-1.5, our reuse policies reduce latency with minimal
repercussions on sample quality.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要