mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM

MM '23: Proceedings of the 31st ACM International Conference on Multimedia(2023)

Inspired by the recent developments of large language models (LLMs), we propose mPLUG-Octopus, a versatile conversational assistant designed to provide users with coherent, engaging, and helpful interaction experiences in both text-only and multi-modal scenarios. Unlike traditional pipeline chatting systems, mPLUG-Octopus offers a diverse range of creative capabilities including open-domain QA, multi-turn chatting, and multi-modal creation, all built with a unified multimodal LLM without relying on any external API. With the modularized end-to-end multimodal LLM technology, mPLUG-Octopus efficiently facilitates engaging and open-domain conversation experience. It exhibits a wide range of uni/multi-modal elemental capabilities, enabling it to seamlessly communicate with users on open-domain topics and engage in multi-turn conversations. It also assists users in accomplishing various content creation and application tasks. Our conversational assistant can also be deployed on smart hardware to drive advanced AIGC applications.
