PIXAR: Auto-Regressive Language Modeling in Pixel Space
CoRR(2024)
摘要
Recent works showed the possibility of building open-vocabulary large
language models (LLMs) that directly operate on pixel representations and are
implemented as encoder-decoder models that reconstruct masked image patches of
rendered text. However, these pixel-based LLMs are limited to autoencoding
tasks and cannot generate new text as images. As such, they cannot be used for
open-answer or generative language tasks. In this work, we overcome this
limitation and introduce PIXAR, the first pixel-based autoregressive LLM that
does not rely on a pre-defined vocabulary for both input and output text.
Consisting of only a decoder, PIXAR can answer free-form generative tasks while
keeping the text representation learning performance on par with previous
encoder-decoder models. Furthermore, we highlight the challenges to
autoregressively generate non-blurred text as images and link this to the usual
maximum likelihood objective. We propose a simple adversarial pretraining that
significantly improves the readability and performance of PIXAR making it
comparable to GPT2 on short text generation tasks. This paves the way to
building open-vocabulary LLMs that are usable for free-form generative tasks
and questions the necessity of the usual symbolic input representation – text
as tokens – for these challenging tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要